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PREFACE 

The object of this work is to provide a mathematical text on the 
Theory of Statistics, adapted to the needs of the student with an 
average mathematical equipment, including an ordinary knowledge 
of the Integral Calculus. The subject treated in the following 
pages is best described not as Statistical Methods but as Statistical 
Mathematics, or the mathematical foundations of the interpretation 
of statistical data. The writer's aim is to explain the underlying 
principles, and to prove the formulae and the validity of the 
methods which are the common tools of statisticians. Numerous 
examples are given to illustrate the use of these formulae ; but, in 
nearly all cases, heavy arithmetic is purposely avoided in the desire 
to focus the attention on the principles and proofs, rather than on 
the details of numerical calculation. 

The treatment is based on a course of about sixty lectures on 
Statistical Mathematics, which the author has given annually in 
the University of Western Australia for several years. This course 
was undertaken at the request of the heads of some of the science 
departments, who desired for their students a more mathematical 
treatment of the subject than those usually provided by courses 
on Statistical Methods. The class has included graduates and 
undergraduates whose researches and studies were in Agriculture, 
Biology, Economics, Psychology, Physics and Chemistry. On 
account of such a diversity of interest the lectures were designed 
to provide a mathematical basis, suitable for work in any of the 
above subjects. No technical knowledge of any particular subject 
was assumed. 

The first five chapters deal with the properties of distributions 
in general, and of some standard distributions in particular. It is 
desirable that the student become familiar with these, before 
being confronted with the theory of sampling, in which he is 
required to consider two or more distributions simultaneously. 
However, the reader who wishes to make an earlier start with 
sampling theory may take Chapters vi and vii (as far as 60) 
immediately after Chapter ni, since these are independent of the 
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theory of Correlation. The theory of Partial and Multiple Correla- 
tions has been left for the final chapter, in order not to delay the 
study of sampling theory and tests of significance. But those who 
wish to study this subject earlier may read this chapter im- 
mediately after Chapter v, or even after Chapter iv. This order, 
however, is not recommended for the beginner. 

A feature of the book is the use of the properties of Beta and 
Gamma variates in proving the sampling distributions of the 
statistics, which are the basis of the common tests of significance. 
Consequently a special chapter (vm) is devoted to the properties 
of these variates and their distributions. The treatment is simple, 
and does not assume any previous acquaintance with the Beta and 
Gamma functions. The author believes that the use of these 
variates brings both simplicity and cohesion to the theory. The 
student is strongly urged to master the theorems of Chapter vm 
before proceeding to tests of significance. In the preparation of 
this chapter, and the following one, much help was derived from 
the study of a recent paper by D. T. Sawkins.* 

The considerations, which determined the presentation of the 
subject of Probability in Chapter n, are the mathematical attain- 
ments of the students for whom the book is intended, and their 
requirements in studying the remaining chapters. The approach 
decided on is substantially that of the classical theory. After the 
proof of Bernoulli's theorem, the relation between the a priori 
definition of probability and the statistical (or empirical) definition 
is considered; and the measure of probability by a relative fre- 
quency in each case is emphasized. If the book had been intended 
primarily for mathematical specialists, a different presentation of 
the theory would have been given. But it is futile to expect the 
average research worker to appreciate an exposition like Kolmo- 
gorofFsf or Cram6r's, J based on the theory of completely additive 

* 'Elementary presentation of the frequency distributions of certain 
statistical populations associated with the normal population.' Journ. and 
Proc. Roy. Soc. N.S.W. vol. 74, pp. 209-39. By D. T. Sawkins, Reader in 
Statistics at Sydney University. 

| Orundbegriffe der Wahrscheinlichkeitsrechnung, Berlin, 1933. 

J Random Variables and Probability Distributions, University Press, 
Cambridge, 1937. 
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set functions. The elementary properties of the moment generating 
function and the cumulative function are also given in Chapter 11, 
and are used throughout the book. These functions are introduced, 
not as essential concepts, but as useful instruments which lead to 
simpler proofs of various theorems. What has been written about 
them should convince the reader that they deserve his careful 
attention. 

I wish to express my appreciation of the care bestowed on this 
book by the staff of the Cambridge University Press, and my 
pleasure in the excellence of the printing. My thanks are due to 
Professor R. A. Fisher and Messrs Oliver and Boyd for permission 
to print Tables 3, 4, 5 and 7, which are drawn from fuller tables 
in Fisher's Statistical Methods for Research Workers. I am also 
indebted to Professor G. W. Snedecor and the Iowa Collegiate 
Press for permission to reproduce Table 6, which is extracted from 
a more complete table in Sncdecor's Statistical Methods. Lastly 
I wish to thank Mr D. T. Sawkins for help received in corre- 
spondence concerning statistical theory, and Mr Frank Gamblen 
for assistance in reading the final proof. 

C.E.W. 

PERTH, W.A. 

April, 1046 



NOTE ON THE SECOND EDITION 

The call for reprinting lias given an opportunity to correct a 
number of small errors and misprints throughout the book, and 
to add a new reference here and there. Paragraph 91 on page 195 
is new to this edition. 

1949 C.K.W. 



CHAPTER I 
FREQUENCY DISTRIBUTIONS 

1. Arithmetic mean. Partition values 

Consider a group of N persons in receipt of wages. Let x shillings 
be the wage of an individual on some specified day. Then, in general, 
x will be a variable whose value changes with the individual. 
Possibly the N values of x will not all be different. Suppose there are 
only n different values x l9 x 2 , ..., x n which occur respectively 
/i/2> /n times. The numbers f i9 in which the subscript i takes the 
positive integral values 1, 2, ..., n, are called the frequencies of the 
values x i of the variable x ; and the assemblage of values x i9 with their 
associated frequencies, is the frequency distribution of wages for that 
group of persons on the day specified. The sum of the frequencies is 
clearly equal to the number of persons in the group, so that 

N-fi+/+ -.>+/*=% f t , a) 

n 

where, as the equation indicates, ft denotes the sum of the n 

i^ I 

frequencies f {9 i taking the integral values 1 to n. When the range of 
values of i is understood, the sum will be denoted simply by /i or 

i 

/< The number N is the total frequency. The mean of the distribu- 
tion is the arithmetic mean x of the N values of the variable, and is 
therefore given by 

* = i(/l*l +/*+ - +/n*n) = i S fat, (2) 

IV IV i ra i 

since the value x i occurs f t times. This formula expresses what is 
meant by saying that x is the weighted mean of the different values x i9 
whose weights are their frequencies f t . 

Frequency distributions of many different variables will occur in 
the following pages. Thus the values of the variable x may be the 
heights, or the weights, or the ages of a group of persons, or the 

WM* I 



2 Frequency Distributions [i 

yields of grain per acre from a number of plots of land. For each 
finite distribution /< will denote the frequency of the value x^ The 
total frequency N is then given by (1), and the mean x of the 
distribution by (2). 

Example. The student to whom the above summation notation is new, 
may profitably verify the following relations. If a is a constant, 



If (x it y t ) is a pair of corresponding values of two variables, x and y, with 
frequency/,-, 



The symbol used as a subscript in connection with summation is immaterial; 
but i, j, r, 8, t are perhaps most commonly employed. 

Suppose that the frequency distribution ofx consists of k partial 
or component distributions, x^ being the mean of the jth component 
and HJ its total frequency, so that 

*-&+ 

Then that part of the sum /*#* which belongs to the jth com- 

i 

ponent has the value nfip and the relation (2) is equivalent to 

1 * 
5 = ;^ S*^- (3) 

Consequently the mean of the whole distribution is the weighted 
mean of the means of its components, the weights being the total 
frequencies in those components. 

Again, let u and v be two variables with frequency distributions 
in which a value oft; corresponds to each value of u. Then the values 
of the variables occur in pairs. Let N be the number of pairs of 
values (u^ v t ), and let 

x = u + v. 
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The arithmetic mean of the N values of x is equal to that of the N 
values of the second member, which is u + v. Consequently 

x~u + v, (4) 

which expresses that the mean of the sum of two variables is equal 
to the sum of their means; and the result can be extended to the sum 
of any number of variables. The reader can prove similarly that, if 
a and 6 are constants, and 

x = au + bv, 



then x = au + bv. (4') 

Suppose the N values of the variable in the distribution to be 
arranged in ascending order of magnitude. Then the median is the 
middle value, if N is odd; while, if N is even, it is the arithmetic 
mean of the middle pair, or, more generally, it may be regarded as 
any value in the interval between these middle values. Similarly, 
the quartiles, Q ly Q 2 , Q 3 , are those values in the range of the variable 
which divide the frequency into four equal parts, the second quartile 
being identical with the median; and the difference between the 
upper and lower quartiles, $3 Qi> is the interquartile range. The 
deciles and percentiles are those values which divide the total fre- 
quency into ten and one hundred equal parts respectively. The 
median, quartiles, deciles and percentiles are often spoken of 
collectively as partition values, since each set of values divides the 
frequency into a number of equal parts. Sometimes they are 
referred to as quantiles. 

That value of the variable whose frequency is a maximum is 
called a mode, or modal value, of the distribution. When, as usually 
happens, there is only one mode, the distribution is said to be 
unimodal. 

We shall presently consider continuous frequency distributions. 
But it should be pointed out at once that the variable may be either 
continuous or discrete. A continuous variable is one which is capable 
of taking any value between certain limits; for example, the stature 
of an adult man. A discrete variable is one which can take only 
certain specified values, usually positive integers; for example, the 

1-2 



4 Frequency Distributions [i 

number of heads in a throw of ten coins, or the number of accidents 
sustained by a worker exposed to a given risk for a given time. Of 
course an observed frequency distribution can only contain a finite 
number of values of the variable, and in this sense all observed 
frequency distributions are discrete. Nevertheless, the distinction 
between continuous and discrete variables will be found to be of 
importance when we come to study populations and probability 
distributions. In the next few sections we shall assume that the 
values of the variables are discrete. 

2. Change of origin and unit 

The following graphical representation will be found helpful. 
Taking the usual #-axis with origin 0, we may represent the variable 
x by the abscissa of the current point P. Then x is the abscissa of a 

AGP X 

1 - 5 - 1~ - ! - ......... 

Fia. 1 

fixed point 0. It is frequently convenient to take a new origin at 
some point A , whose abscissa is a. Let be the abscissa of P relative 
to A as origin. Then, since OP = OA + AP, we have 



Thus is the excess of x above a, or the deviation of x from that 
vahie. Taking the mean of each member of this equation we have 

$ = a-f-|. (5) 

Thus x a, which is the deviation of the mean value of a: from a, is 
equal to f, which is the mean of the deviations of the values x i from 
a. In particular, by taking a as x, we have the result that the sum of 
the deviations of the values x i from their mean is zero. This is also 
easily proved directly; for 



(6) 

i i 

in virtue of (2). 
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In addition to choosing A as origin it may be convenient to use 
a different unit, say c times the original unit. Then, if u is the 
deviation of P from A measured in terms of this new unit, 



or 



u (x - a)/c 
x = a + cu. 



(7) 



Taking the mean value of the variable represented by either side 
we have, in virtue of (4'), 

(8) 



u being the mean value of u for the distribution. 

Example. Eight coins were tossed together, and the number x of heads 
resulting was observed. The operation was performed 256 times; and the 
frequencies that were obtained for the different values of x are shown in the 
following table. Calculate the mean, the median and the quartiles of the 
distribution of x. 



X 


f 





/ 


/ 2 


/ 3 







^ 


- 4 


16 


- 64 


1 


9 


-3 


-27 


81 


-243 


2 


26 


-2 


-52 


104 


-208 


3 


59 


-1 


-59 


69 


- 59 


4 


72 














5 


52 


1 


52 


62 


52 


6 


29 


2 


68 


116 


232 


7 


7 


3 


21 


63 


189 


8 


1 


4 


4 


16 


64 


Totals 


256 





- 7 


607 


- 37 



The different values of x are shown in the first column, and their frequencies 
in the second. The calculation is simplified by taking the value x = 4 as 
origin of . The values of corresponding to those of x are given in the third 
column, and those of the product / in the fourth. The remaining columns are 
not needed for the present. Totals for the various columns arc given in the 
bottom row. Hence 

I ==-/, , = -7/256 = -0-027. 

The mean value of x is therefore 

x = o + 1 = 4-0-027 = 3-973. 

The mode, being the value of x with the largest frequency, is clearly 4. To 
find the median we observe that the values of x are arranged in ascending 
order, and that the 128th and 129th are both 4. Hence the median is also 4, 
Similarly the 64th and 65th values are both 3, so that the lower quartile is 3. 
In the same way we find that the upper quartile is 5. 
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3. Variance. Standard deviation 

The mean square deviation of the variable x from the value a is, 
as the name implies, the mean value of the square of the deviation 

of x from a. Jt is therefore given by T^ /;! The positive square 

IV i 

root of this quantity is the root-mean-square deviation from a. In 
the important case in which the deviation is taken from the mean of 
the distribution, the mean square deviation is called the variance 
of #, and is denoted by /^ 2 . The reason for the notation will appear in 
the next section. The positive square root of the variance is called 
the standard deviation (S.D.) of #, and is denoted by cr. Thus 



The variance (or the S.D.) may be taken as an indication of the 
extent to which the values of x are scattered. This scattering is 
called dispersion. When the values of x cluster closely round the 
mean, the dispersion is small. When those values, whose deviations 
from the mean are large, have also relatively large frequencies, the 
dispersion is large. The concepts of variance and S.D. will play a 
prominent part in the following pages. 

When the mean square deviation from any value a is known, and 
also the deviation | of the mean from that value, the variance is 
easily calculated. For 



This formula is of great importance, and will be constantly employed. 
On multiplying by N we have an equivalent relation, which may be 
expressed 

(11) 



Thus Ncr 2 is less than /<? showing that the sum of the squares of 

i 

the deviations of the values x i is least when the deviations are measured 
from the mean. 
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Another possible measure of dispersion is the mean value of the 
absolute deviation from the mean of the distribution, commonly 
called the mean deviation from the mean. This quantity, however, 
does not lend itself readily to algebraical treatment, and is therefore 
not nearly so important as the variance and the S.D. The semi- 
interquartile range is also sometimes taken as an indication of the 
magnitude of the dispersion. 

The significance of the magnitude of the standard deviation 
clearly depends upon the values of the variable. Thus a S.D. of 6 in. 
in the measurements of the height of a tower, is much less significant 
than an equal S.D. in the measurements of the height of a man. The 
ratio of the S.D. to the mean value of the variable is called the 
coefficient of variation. It is an absolute measure of dispersion in the 
sense that it is independent of the unit employed. And by means of 
this coefficient we are able to compare the variabilities of distribu- 
tions of different characteristics, such as weight and height. Some- 
times the coefficient of variation is defined as 100 times the above 
value, i.e. as the percentage of the mean which is equal to the S.D. 

Example 1. In the example of the preceding section the mean square 
deviation of x from the value 4 is 507/256 = 1*98. Hence the variance is 
given by 

<r* = 1-98 - = 1-98- (0-027) 2 = 1-98, 

and cr = 1*407 = 1-41 nearly. 

From tho/ column it is clear that the sum of the absolute deviations from 
x 4 is 277. By measuring deviations from the mean, instead of from x 4, 
we increase the absolute deviations of 161 values, and decrease those of 95 
values, by 0-027. Hence the sum of the absolute deviations from the mean is 
277 + 66(0-027) = 278-78. The mean deviation is therefore 1-09 approxi- 
mately. 

Example 2. Find the mean and the variance for the distribution in which 
the values of x are the positive integers 1, 2, 3, ..., N 9 the frequency of each 
being unity. 

..+>.*+. 

The mean square deviation from x = is 

(l* + 2* + ...+tf*)/N = i(N+ l)(2tf + l). 
Henoe <r = $(N+ 1)(2N+ l)-t(2V+ I) 8 
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Example 3. For the distribution expressed by 

* d 6 7 8 10 11 12 13 14 15 

/ = 18 25 34 47 68 90 80 62 38 27 11 

the total frequency is 500. Show that the mean value ofx is 10-054, the vari- 
ance 5-58, the S.D. 2-36, the median 10, and the lower and upper quartiles 9 
and 12 respectively. Also calculate the mean deviation from the mean as 
in Ex. 1. 

Example 4. A distribution consists of several component distributions. 
Express the variance of the whole distribution in terms of those of the 
components and the deviations of the means of the components from the 
general mean. 

Let n,- be the frequency in thejfth component, (Tj its S.D., and dj Xj x the 
deviation of its mean from the general mean. Then the mean square deviation 
of this component from the general mean is d] -f </^, and the sum of the 
squares of its deviations from the general mean is n^((rj-f dj). Hence the 
variance <J 2 of the whole distribution is given by 



where N is the total frequency i] n f . 

1 

4. Moments 

In the notation of the preceding sections the mean value of the 
rth power of the deviation of the variable from the value a is 
X/ibJ/N- This is usually called the rth moment of the distribution 

i 

about the value a, or the moment of order r. The term 'moment' is 
borrowed from Mechanics. Since fJN is the relative frequency of the 
value x i in the distribution, and the deviation from a is represented 
by the distance AP\ the above expression can be regarded as the 
sum of the rth moments of the relative frequencies about A. The 
rth moment about the mean of the distribution is denoted by /i r . 
The corresponding moment about a specified value other than the 
mean, will be denoted* by /4- Thus 



is the rth moment about the mean, while 

/*; = -oS/,(*<-a) r -is/,S (13) 

IV I IV i 

* An alternative notation, with v f instead of X, has some advantages. 
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is the rth moment about the value a. Putting r we see that 






Similarly, in virtue of (2) and (6), we have 

/4 = 2, /<i = 0. (15) 

The second moment about the mean is clearly the variance already 
discussed. 

By means of the binomial expansion, moments about the mean 
of the distribution may be expressed in terms of moments about 



any other value, x = a. Thus 



denoting the binomial coefficient, often written T C 8 or CJ. In 
particular, in virtue of (14) and (15), 

^ 3 = /4-2l 2 + P-/'o-I 2 (17) 

in agreement with (10). Similarly 

+ 2?, 1 

/* 4 = /v-4/+c! 2 /4-32 4 J 

and so on. 

In calculating moments it is frequently convenient to change the 
unit. As in 2, let u be the measure of the deviation from x = a, in 
terms of a unit c times the original unit, so that = CM. Then the 
rth moment of x about a is 



Thus the rth moment of the variable x is c r times the corresponding 
moment of the variable u. 

A distribution is said to be symmetrical when the frequencies are 
symmetrically distributed about the mean, that is to say, when 
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values equidistant from the mean have equal frequencies. For 
example, the distribution expressed by 

z=0 12345678 

/= 1 8 28 56 70 56 28 8 1 

is symmetrical about its moan x = 4. In the case of a symmetrical 
distribution there is the simplification that all the moments of odd 
order about the mean are equal to zero, since the terms of the sum 
in (12) cancel in pairs. In the case of an unsymmetrical distribution, 
the degree of departure from symmetry is called its skewness. 
More than one measure of this property has been proposed. One of 
the simplest is /^/tf 3 , while another is half this expression. These are 
clearly independent of the unit chosen for the variable, and they 
vanish if the distribution is symmetrical. Another measure of 
skewness, proposed by Karl Pearson, will be given later. 

Example 1 . For the distribution of x in the example of 2, the third moment 
about = is 

^ = -37/256 = -0-145. 

Hence the third moment about the mean is given by 



= - 0-1 45 - 3( - 0-027) (1-98) + 2( - 0-027) 3 
= 0-018 nearly. 

The skewness, calculated from the formula / 3 /<r 8 , is 

0-018/2-8 - 0-0064, 
which is very small. 

Example 2. For the distribution in 3, Ex. 3, show that /* 3 = 1-92, and 
deduce that // 3/ Vr 3 = -0-146. 

5. Grouped distribution 

Frequently the number of different values of the variable repre- 
sented in the distribution is so large that, for convenience in cal- 
culating the moments, it becomes necessary to approximate by 
grouping the values. In such cases the range of variation of x is 
usually divided into a number of equal intervals. The group of 
values falling in a given interval constitutes a class] and the number 
of such values is the class frequency. The magnitude of an interval 
is called the class interval. For simplicity of calculation the number 
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of intervals chosen will not be too large, preferably not more than 
20, or at the most 25; and, in order that the results may be suffi- 
ciently accurate, the number must not be too small, preferably not 
less than 12. The approximation in the calculation of moments 
consists in regarding each value as equal to the mid-value of the 
interval in which it falls. We shall later meet formulae,* known as 
Sheppard's adjustments, giving the corrections that may be applied 
under certain conditions to the approximate values of the moments 
calculated as above. We merely mention here that, under the 
conditions referred to, the correction to the mean may be neglected, 
while the calculated variance should be reduced by c 2 /12, c being 
the magnitude of the class interval. The following is an example of 
the treatment of a distribution by grouping. 

Example. Over a period of years, 570 students were examined in Mathe- 
matics I at the annual examinations of the University of Western Australia. 
The marks gained by students ranged from to 99, all being integers. These 

Percentages in Mathematics 





Mid- 












Interval 


value 


/ 


ti 


/" 


J* 


J* 


to 4 


2 


12 


-10 


-120 


1,200 


-12,000 


5 to 9 


7 


13 


- 9 


-117 


1,053 


- 9,477 


10 to 14 


12 


13 


- 8 


-104 


832 


- 6,056 


15 to 19 


17 


14 


- 7 


~ 98 


686 


- 4,802 


20 to 24 


22 


23 


- 6 


-138 


828 


- 4,968 


25 to 29 


27 


23 


- 5 


-115 


675 


- 2,875 


30 to 34 


32 


20 


- 4 


-116 


464 


- 1,856 


35 to 39 


37 


34 


- 3 


-102 


306 


- 918 


40 to 44 


42 


44 


- 2 


- 88 


176 


- 352 


45 to 49 


47 


44 


- 1 


- 44 


44 


44 


50 to 54 


52 


50 





-1,042 




-43,948 


55 to 59 


57 


52 


1 


52 


62 


52 


60 to 64 


62 


61 


2 


122 


244 


488 


65 to 69 


07 


41 


3 


123 


369 


1,107 


70 to 74 


72 


32 


4 


128 


512 


2,048 


75 to 79 


77 


27 


5 


135 


675 


3,375 


80 to 84 


82 


23 


6 


138 


828 


4,968 


85 to 89 


87 


17 


7 


119 


833 


6,831 


90 to 94 


92 


13 


8 


104 


832 


6,656 


95 to 99 


97 


5 


9 


45 


405 


3,645 










966 




28,170 


Totals 





570 





- 76 


10,914 


-15,778 



* See 16. 
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were grouped in 20 classes, with a class interval of 6, the class frequencies 
being as shown in the accompanying table. The mid -values of the intervals 
are 2, 7, 12, ... t 97. The calculation is simplified by taking the class interval 
as a new unit (c = 5), and the value x = 52 as a new origin. The deviations u 
from this origin, measured in class intervals, are shown in the fourth column. 
The above choice of origin ensures that the larger frequencies are multiplied 
by the smaller values of u. In the row corresponding to u the entries for 
/M,/W*, etc., are all zero, and need not be recorded. The space may therefore 
be used to record the sums of the negative numbers above this row. The sums 
of the positive numbers are similarly indicated at the bottom; and the total 
for each column is given in the last row. 
The mean value of u is therefore 

= -76/570 = -2/15 =-- -0-133, 

and the mean percentage 

x = a + cu = 52- 0-61)7 = 61-333. 

The mean square deviation of u from u has the value 10,914/570 = 19-15; 
and the variance of u, which may be denoted by crj;, is 

al= 19-15-w 2 := 19-13. 

Consequently o\, = 4-374, and therefore cr^ = 21-87. If, however, Sheppard'a 
adjustment is made, we find 

<rl = 19-047, cr M = 4-365, cr, = 21-82. 

The third moment of u about u ~ has the value - 15,778/570 = - 27-68. 
Thus, in virtue of (18), 

/* a = - 27-68 -f 7-65 - 0-005 = - 20-03. 

The shewn ess of the distribution, calculated from the formula /* a /cr 3 , is 
0-24 nearly. The reader may verify that the mean deviation of a; from the 
mean is about 17-7. 

The partition values may be estimated on the assumption that the fre- 
quency in any class is evenly distributed among the values in that class. The 
reader will find in this way that the median is 53, and the lower and upper 
quartiles 37 and 66 respectively. 

6. Continuous distributions 

The distributions considered so far are discrete distributions, or 
distributions of discrete values of the variable. A continuous dis- 
tribution is one in which the variable takes every value between 
certain limits, a and 6. The total frequency is therefore infinite; and 
so is the frequency within any finite interval, a to /?, of the range of 
the variable. We shall confine our attention to continuous distribu- 
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tiong, which are such that the relative frequency in the infinitesimal 
interval x-$dx to x+kdx is expressible &sf(x)dx, where /(x) is a 
continuous function of x, called the relative frequency density. Then 
the continuous curve 

*/=/(*) (20) 

is the relative frequency curve for the distribution. If this curve is 
symmetrical about some line x = c, the distribution is said to be 
symmetrical. The infinitesimal interval x {dx to x -f \dx, with mid- 
value x and magnitude dx, may be conveniently referred to as the 
interval dx. The relative frequency f(x)dx in this interval is repre- 
sented by the area under the curve (20), between the ordinates at 
the ends of the interval. Hence the relative frequency for the 

rfl 
interval a to y? is given by the integral f(x) dx. The sum of all the 

Ja 

relative frequencies is unity, so that 



r 

J a 



(21) 



As in the case of a discrete distribution, the moment of order r 
about a specified value is the sum of the rth moments of the relative 
frequencies about that value. Hence the mean x, which is the first 
moment about the origin, is given by 

C b 
x = xf(x)dx. (22) 

Ja 

The moments of order r, about the origin and the mean, are 

p r =( (x-zYf(x)dx. 

J a 

And, in virtue of the binomial expansion, it follows as in 4 that 
the relations (16)-(18) hold for a continuous distribution also. In 
particular, the variance of the distribution is given by 

a* = ^ = f x*f(x) dx-x*. (23) 

Ja 

Example 1. By way of illustration consider a straight rod of length L 
The distance x of a molecule of the rod from one end may be regarded as a 
continuous variable, ranging from to I; and the distribution of the values 
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of x is then continuous. If M is the mass of the rod, and the linear density m 
is continuous, the function f(x) for the distribution has the value m/M. 

In the case of a uniform rod, of length 2a, the distance a; of a molecule from 
the middle point varies continuously from a to a, and the distribution of 
x is continuous with/(#) = l/2a. The frequency curve is a straight line parallel 
to the ic-axis, and the distribution of x is said to be rectangular or uniform. 
The mean value of x is now zero, arid the variance is given by 

x z dx a z 



All the moments of odd order about the mean are zero. The moment of order 
2n is easily shown to be a zn /(2n -f 1). 

Example 2. Draw the frequency curve for the symmetrical distribution 
in which 



the range of the variable being a to a; and show that/(.x) satisfies the 
condition (21). Also show that the variance is 

(7 a = a 2 (4~7T)>^ 0-273a 2 , 
and the S.D. a = 0'52a, while 

Pi = a 4 (l - 8/3/r) = 0-151a*. 

It is apparent from the above that a discrete distribution cannot 
be identified with a continuous one. Sometimes, however, a discrete 
distribution D l approximates to a certain continuous distribution Z) 2 
in the sense that, for any interval in the range of the variable, the 
relative frequency of D l is approximately equal to that of Z> 2 . 
Such an approximation requires that the total frequency N of D l 
should be large, and that there should be only very small intervals 
between pairs of adjacent values of the variable. Consider, for 
example, the distribution of ages of all living people, not more than 
(say) 50 years old at a specified instant. The variable x ranges from 
to 50 years; and the frequency in any interval, not less than a few 
hours, is so large that, since a period of a few hours is very small 
compared with 50 years, the distribution may be regarded as 
continuous for all practical purposes. If the necessary data 
were available, we could calculate a continuous function f(x) to 
give the relative frequency density to any reasonable degree of 
accuracy. 
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The diagram illustrates the frequency curve of an unsymmetrical, 
unimodal, continuous distribution. The mode of the distribution is 
the abscissa of the maximum ordinate NN'. Since the area under 
the curve in any interval represents the relative frequency for that 




FIG. 2 

interval, it follows that the median is the abscissa of the ordinate 
M M ', which bisects the area under the curve. And from (22) we see 
that the mean is the abscissa of the ordinate GG', which passes 
through the centroid of the area under the curve. We may mention 
in passing that, when the skewness is small, the relation 

mean mode = 3 (mean median) 

holds fairly accurately. The student may regard this as an empirical 
relation, since we shall not give any proof. The definition of skewness 
most frequently used is that of Karl Pearson. It is 



skewness 



mean mode 
S.D. 



When the mode can be accurately determined this definition is a 
convenient one. The frequency curve in the diagram is that of a 
positively skew distribution, the longer tail of the curve being to 
the right. 

The ratio of the fourth moment about the mean of a distribution, 
to the square of the variance, is independent of the unit employed. 
This invariant of the distribution is called its kurtosis, and is fre- 
quently denoted by /? 2 . Thus 
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In the normal distribution, to be considered later, the kurtosis has 
the value 3. Since this distribution is regarded as the standard or 
ideal, the quantity /? 2 3 for any distribution is called its excess of 
kurtosis, or briefly its excess. Corresponding to the above notation, 
/?! is used to denote the invariant fi\l[i\. This is the square of the- 
skewness as defined in 4. 
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EXAMPLES I 

1. A distribution consists of three components with frequencies 
of 200, 250 and 300, having means of 25, 10 and 15, and standard 
deviations of 3, 4 and 5 respectively. Show that the mean of the 
combined distribution is 16, and its S.D. 7*2 approximately. 

2. The yields of grain (x Ib.) from 500 small plots are grouped 
in classes with a common class interval (0-2 Ib.) in the table below, 
the values of x given being the mid -values of the classes. Show 
that the mean of the distribution is 3-95 Ib., its S.D. 0-46 Ib., the 
median also 3-95 Ib., and the quartiles 3-63 and 4-28 Ib. 
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/ 


x 


j 


x 


/ 


x 


/ 


x 


* 


2-8 


4 


3-4 


47 


4-0 


88 


4-6 


35 


6-2 


4 


30 


15 


3-6 


63 


4-2 


69 


4-8 


10 








3-2 


20 


3-8 


78 


4-4 


59 


5-0 


8 









(Mercer and Hall, Journ. Agnc. Sci. 1911, vol. 4, p. 107.) 

* The references are to the literature listed at the end of the book. The 
student is strongly advised to read some of the literature mentioned at the 
end of each chapter, preferably in the order given. He will thus acquire a 
better background for the mathematical theory. 
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3. The wages of 1,000 employees range from 4s. 6dL to 19s. 6dL 
They are grouped in 15 classes with a common class interval of Is., 
and the class frequencies, from the lowest class to the highest, are 
6, 17, 35, 48, 65, 90, 131, 173, 155, 117, 75, 52, 21, 9, 6. Tabulate 
the data, and show that the mean wage is 12'OOOs., the S.D. 
2-626s. = 2s. l\d., and the median 12-127$. = 12s. l\d., and the 
mode 12-369s. = 12s. 4|d. nearly. (Adjusted S.D. 2-61s.) 

4. With the notation of 4, and by a method similar to that used 
in proving (16), show that 



5. The first three moments of a distribution about the value 2 
of the variable are 1, 16 and 40. Show that the mean is 3, the 
variance 15, and // 3 = 86. Also show that the first three moments 
about # = are 3, 24 and 76. 

6. Prove that the mean deviation from the median is less than 
that measured from any other value. (See Aitken, 1939, J, p. 32.) 

7. Show that, if the class interval of a grouped distribution is 
less than one-third of the calculated S.D,, Shcppard's adjustment 
makes a difference of less than \ % in the estimate of the S.D. 

8. Show that, if the variable takes the values 0, 1, 2, 3, ..., n 
with frequencies proportional to the binomial coefficients 

1, n, i|, I I , ..., n, 1 respectively, then the mean of the distribu- 

tion is in, the second moment about x = is n(n+ l)/4, and the 
variance is \n. 

9. In a continuous distribution, whose relative frequency density 
is given by f(x) 3z(2 #)/4, the variable ranges from to 2. Show 
that the distribution is symmetrical, with mean x = 1, and variance 
1/5. Show that the second and third moments about x = are 6/5 
and 8/5 respectively; and verify that /* 3 = 0. 
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10. Exponential distribution. Consider the continuous dis- 
tribution in which f(x) = ae~ ax , a being positive, and the variable 
ranging from to oo. Show that the mean is I/a and the variance 
I/a 2 . Also prove that the second and third moments about x = 
are 2/a 2 and 6/a 3 respectively, and that /% = 2/a 3 . 

11. Factorial moments. The factorial moments of a distribution 
are defined as follows. That of order r about the origin (x = 0) is 



where x^ x(x 1) (x 2) ... (x r + 1). 

Show that factorial moments are related to ordinary moments by 
the equations 

Xi) = ^i = a, Xa> = /4 ~ %> 



and so on. Similarly 



The expression for the factorial moment / (r) about the mean is 
obtained from that defining // ( ' r) by replacing x by the deviation 
x x from the mean. Relations corresponding to the above are 
obtained by dropping the dashes and putting x = 0. The student 
may then easily verify that factorial moments about the mean are 
connected with those about the origin by the equations 



- x - 2) . 



CHAPTER II 

PROBABILITY AND PROBABILITY 
DISTRIBUTIONS 

7. Explanation of terms. Measure of probability 

We begin by approaching the subject of probability from the 
point of view of the classical theory. Later in the chapter we shall 
give the statistical or empirical approach, and the relation between 
the two will become apparent. We hope in this way to provide a 
simple introduction to the subject, adapted to the needs of those 
for whom the book is written. 

The throwing of an ordinary cubical die may result in any one 
of six different cases, in the sense that any one of the six faces may 
be uppermost when the die comes to rest. This group of six cases is 
exhaustive, because it includes all possible cases that may result. 
The different cases are mutually exclusive, since no two faces can 
be uppermost at the same time. The throwing of the die may be 
referred to as a trial. In general a trial is the establishing of certain 
conditions, which must produce one of several results or cases. Two 
such cases are said to be mutually exclusive when the happening of 
one of them precludes the happening of the other; and a group of 
cases is said to be exhaustive when it includes all possible ones. The 
number of cases favourable to an event A are those that entail the 
happening of A. For example, in the throwing of the die three of 
the cases are favourable to the appearance of an even number, and 
two are favourable to the appearance of a multiple of 3. 

Suppose that our die is a perfect cube made of homogeneous 
material, and that the marking of the faces has not made it dynamic- 
ally unsymmetrical. Then there is no reason to expect that, as the 
result of an unbiased throw, any particular face will come uppermost 
rather than any other. We say that the six cases are equally likely, 
or equally probable. Similarly the 52 cases that may result from 
the drawing of a card without discrimination from an ordinary pack, 
are equally likely. And, in general, two events are said to be equally 

2'2 
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likely if, after all relevant evidence has been taken into account, one 
of them may not be expected rather than the other.* 

The terms probable and probability are used in ordinary language 
in many different connections; and the reader will agree that, in a 
great many instances, it is useless to attempt a numerical estimate 
of the probability under consideration. For instance, we cannot 
give a numerical value to the probability that a certain man's 
political or religious beliefs are correct, or that a statement by a 
perfect stranger is true. There is, however, a very large group of 
questions in connection with which a numerical estimate of prob- 
ability may be attempted with very useful results. We shall first 
state the method of measurement adopted in the classical theory, 
and then give examples of problems to which the method is applic- 
able. 

Measure of probability .f // a trial may^result in any one of n 
exhaustive ^nutualh^ex^lusive and equally I i keli^ .casts and m of these 
are favourable to an eywtij^^ ( or chance^ that A 

wjl^happen as the result of the trial is Treasured by the quotientjnjri. 

This measure of the probability of the event A is denoted by p. 
Thus 

'-" <" 

and p is clearly a positive number, not greater than unity. The 
opposite event is understood as the failure of A to happen; and its 
probability is denoted by q. Since the number of cases involving the 
failure of A is n m, it follows that 

q = (n m)/n I m/n 1 p, 
so that p + q = 1. (2) 

This method of measuring probability is confined to problems in 
which the results of a trial are reducible to a certain number of 
equally likely cases. Considerations of symmetry and similarity 
frequently enable us to decide whether, in the problem before us, 
the resulting cases are of this nature; and, only if they are so, is the 
calculation valid. Objection is often raised to the use of the idea of 

* Cf. Uspensky, 1937, 2, p. 5. t Ibid. p. 6. 
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equally probable cases in defining the measure of probability. The 
objection loses some of its force if we remember that, in the corre- 
sponding concept of temperature, there are methods of recognizing 
equality of temperature which are quite independent of the measure- 
ment either of absolute temperature or of difference in temperature. 
When we speak of choosing one at random from a group of n 
objects, we mean that the choice is made in such a way that each 
object has the same chance of being selected. Various methods have 
been devised for making such a choice; and these do not assume any 
similarity on the part of the objects in the group. One such method 
is illustrated by the procedure often adopted in the case of a lottery. 
The objects are numbered 1, 2, ..., n, and n similar marbles are 
numbered correspondingly. The marbles are placed in an urn, and 
thoroughly mixed; and one of them is then chosen without dis- 
crimination. The number on this marble determines the object 
selected. The method thus uses the similarity of the marbles to 
ensure that the selection is random. Statisticians have devised 
other methods of random selection ; but it is unnecessary for us to 
examine them. 

Example 1. The chance of throwing a 6 with an ordinary die is 1/6. The 
chance of throwing an odd number is 3/6 = 1/2. 

Example 2. In a class of 12 pupils, 5 ore boys and the remainder girls. 
The probability that a pupil selected at random will bo a girl is 7/12. 
The probability that two pupils selected at random will both be girls is 

/7\ /12\ 7 

I I ~- 1 I = - . The odds against their being both girls are 15:7. 

Example 3. From each of three married couples one of the partners is 
selected at random. What is the probability of their being all of one sex? 

The number of favourable cases is two, viz. all men or all women. The 
total number of equally likely cases is 2 3 = 8. Hence the required probability 
is 1/4. The odds against are therefore 3:1. 

The probability of choosing two men and one woman is 3/8. 

8. Theorems of total and compound probability 

The determination of probabilities by direct enumeration of the 
number of cases is often laborious. The calculation may be simplified 
by using the theorems of addition and multiplication of probabilities, 
which are also known as the theorems of total and compound prob- 
ability. Let us first consider addition. 
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Suppose that a trial may result in anyone of n equally likely cases, 
of which m l are favourable to an event A^ w 2 to an event A^ . . . , m k 
to an event A k , and so on. If the events A l9 ..., A k are mutually 
exclusive, the corresponding favourable cases are all different. The 
number of cases favourable to either A 1 or A 2 , ... or A k is therefore 
m l -i- m%+ ... -f m k . Hence the probability that one of these events 
will happen is 

m l + mi+...+m k ^m l ^ ^^^ |, 

n n n "" n i~i 

where p i is the probability of the event A if We may therefore state 
the theorem of addition of probabilities: 

The probability that^one of several mutu^tt^ exclusive events will 
Jiappen is the sum of the probabilities of_the separate events,. 

In particular, if the Jc events A i are exhaustive, so that they include 
all the mutually exclusive cases that may arise, the sum of the Jc 
numbers m i is equal to n, and therefore ^p i = 1 . Thus the sum of the 
probabilities of the exhaustive and mutually exclusive cases that 
may result from the trial is equal to unity. From the above argu- 
ment it is also clear that, if an event may occur in several mutually 
exclusive forms, the probability of the events is the sum of the 
probabilities of its mutually exclusive forms. 

Example 1. What is the probability of obtaining a total of 9 points in a 
single throw with two dice ? 

The event may happen in one of the four mutually exclusive forms, 6 and 3, 
5 and 4, 4 and 5, 3 and 6, the chance of each of these being 1/36. Hence the 
required probability is 1/9. 

Show that the probability of a total of 7 points is 1/6, and that of a total 
of 10 points is 1/12. 

Example 2. From a set of 17 cards, numbered 1, 2, ..., 17, one is drawn at 
random. What is the chance that its number is a multiple of 3 or of 7 ? 

The chance of its being a multiple of 3 is 5/17, and that of a multiple of 7 
is 2/17. The required probability is therefore 7/17, the two events being 
mutually exclusive. 

Show that the chance that the number will be a multiple of 3 or of 6, or 
of both, is also 7/17. 

Consider next a pair of related trials. Suppose, for instance, that 
we have an urn containing r red and s white balls, and that we make 
a random drawing of two balls in succession. The first trial is the 
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drawing of the first ball; and, if this proves to be red, we shall say 
that the event A has happened. Similarly, the second trial is the 
drawing of the second ball, without replacing the first; and, if this 
proves to be red, we shall say that the event B has happened. The 
probability of A is clearly r/(r -f s). That of B depends on whether A 
has happened or not. Thus, if A has happened at the first trial, the 
probability of B in the second is (r l)/(r-f s 1); but, if A has not 
happened, it is r/(r + s 1). The former of these is the conditional 
probability of B on the assumption that A has happened. Thus the 
two events, A and B y are not independent. Two events are said to 
be independent only when the probability of either of them is not 
affected by the happening or failure of the other. By way of illustra- 
tion consider a combined trial consisting of the throwing of two 
ordinary dice, either together or in succession. If the event A is the 
throwing of a 6 with the first die, and the event B the throwing of 
a 5 with the other, the probability of B is 1/6, whether A has hap- 
pened or not. Events A and B are thus independent. 

More generally, suppose that a trial or a combination of trials 
may result in any one of n equally likely cases, some of which entail 
the happening of A alone, some that of B alone, and others that of 
both A and B. Let ra be the number favourable to the event A . 
The cases favourable to both A and B are all included in these m. 
Let m l be their number. Then the probability p that both A and B 
will happen is given by 

m* m m, 

/v\ _ 



n n m 



The first quotient, m/n, is the probability of the event A. Also, 
since m is the number of cases that entail A, and m l is the number 
of these that entail B also, m^m is the conditional probability of 
B on the assumption that A has happened. Hence we have the 
theorem of compound probability: 

The probability of the combined occurrence of two events, A and J^, 
is the product of the probability of A by the conditional probability 
of B on the asww^iorLj^^ 

In the case of independence the result may be stated simply, that 
the probability that two independent events will both happen is 
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the product of the probabilities of the separate events. The theorem 
may clearly be extended to include any number of events. 

Example 3. A coin is tossod three times. Find the chance that head and 
tail will show alternately. 

The required event has two mutually exclusive forms, viz. head -tail -head 
and tail-head-tail. By the above theorem the chance of either of these is 
(J) 3 ; arid by the theorem of total probability the required chance is then 



Example 4. Three groups of children contain respectively 3 girls and 
1 boy, 2 girls and 2 boys, 1 girl and 3 boys. One child is selected at random 
from each group. Find the chance that the three selected comprise 1 girl 
and 2 boys. 

The event may happen in any of the mutually exclusive ways girl-boy-boy, 
boy -girl-boy, boy-boy-girl. The probabilities of these three are f . J . J, 
i-i- J> i--i respectively. The required probability is their sum, which is 
13/32. 

9. Probability distributions. Expected value 

Suppose that, corresponding to the n exhaustive and mutually 
exclusive cases that may result from a trial, a variable x assumes the 
n values x i with corresponding probabilities p t . Then the assemblage 
of values x i9 with their probabilities p iy constitutes the probability 
distribution of the variable for that trial. A variable, which possesses 
a probability distribution, is often called a variate. Most of the 
concepts introduced in connection with frequency distributions are 
equally applicable to probability distributions. Thus the rth 
moment, /^, of the distribution about the value x = is defined by 
the equation 

/4=2PiZ$, (3) 

i 

probability taking the place of relative frequency since, as proved 
in 8, ^p f = 1. In particular, the first moment about x = is 



Just as in the case of frequency this moment is called the mean value 
of the variate, or of the distribution. More commonly, however, it 
is referred to as the expected value or expectation of the variate. It 
is denoted by E(x), or sometimes by x 9 though the former is prefer- 

able. Thus 

E(x) = Zft*,. (5) 
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Let fr(x) be a function of the variate x. This assumes the value 
^(Xi) when x assumes the value x t \ and p t is therefore the probability 
of this value of the function. The expected value of the function is 
thus given by 



In particular (3) expresses that /i' r is the expected value of the rth 
power of the variate. 

The rth moment about the mean of the probability distribution 
is defined by a formula corresponding to 4 (12). Thus 

A- = XPifri-xY = E([x-E(x)]'). (7) 

i 

In particular, the second moment about the mean is the variance of 
the distribution; and its positive square root is the standard devia- 
tion, (r. The relation 

/* 2 = /4-* 2 W 

holds as before. It may be expressed in the alternative notation 

E([x - E(x)]*) = E(x*) - [E(x)}-. (8') 



The relations between [i r and fi' r are the same as for a frequency 
distribution; while, corresponding to 2 (6), we have the identity 

E[x-E(x)]=*0. (9) 

In words, the expected value of the deviation of a variate from its mean 
is zero. 

Example 1. What is the expected value of the number of points that will 
be obtained in a single throw with an ordinary die? 

Here the variate is the number of points showing. It assumes the values 
1, 2, ..., 6 with probability 1/6 in each case. Hence 

E(x) = (1 + 2+ ... + C)/6 = 7/2 = 3-5. 

Example 2. From an urn, containing 3 red balls and 2 white, a man is to 
draw two balls at random without replacement, being promised 205. for each 
red ball he draws, and 10s. for each white one. Find his expectation. 

For the possible results of the drawing there are three exhaustive and 
mutually exclusive cases, viz. 2 red balls, 1 red and 1 white, and 2 white. 
The corresponding probabilities are easily shown to be 3/10, 3/5 and 1/10. 
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The variable is the amount to be given to the man on the result of the draw; 
and this has the value 40s. in the first case, 30s in the second, and 20s. in the 
third. The expected value of tho amount to be given is therefore 

(^ x 40-f | x 30 -f !^- x 20)*. = 32s. 
Example 3. Show that, if c is constant, 

E(cx) = cE(x), E[cft(x)] = 



10. Expected value of a sum or a product of two variates 

Let x and y be two variates, the first of which assumes the m values 
x i with probabilities p i (i = 1, 2, ..., m), and the second assumes the 
n values y$ with probabilities p'j (j = 1, 2, ..., n). The sum x + y can 
assume the mn values of x t + y^ since any of the m values of i may 
be associated with any of the n values of j. Let p^ denote the 
probability that x assumes the value x i and, at the same time, y 
assumes the value y r . Then by (5) 

m n 

= S 






For lty being the sum of the probabilities that x assumes the 

value x i while y assumes one of the values y v 1/ 2 , ..., y n , is equal to 
p t . Similarly S## = JP} Consequently 



(10) 

Thus Ae expected value of the sum of two variates is equal to the sum 
of their expected values', and the theorem may be extended to the 
sum of any number of variates. 

The variates, x and y, are said to be independent* if the probability 

* Or, statistically independent. Since this is the kind of independence with 
which we arc chiefly concerned in these pages, the adverb * statistically ' will 
usually be omitted. Statistical independence has been described by Aitken 
(1939, 1, p. 148) as * obedience to the multiplication theorem of probability'. 
This will be apparent to the student after he has read 12 and 31 below. 
Statistical independence should not be confused with functional independ- 
ence. The variables x and y are functionally dependent when there exists a 
functional relation F(x 9 y) = which holds identically. They are functionally 
independent if no such relation exists. 
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that either of them will assume a prescribed value does not depend 
on the value assumed by the other. We proceed to examine the 
expected value of the product, xy, of two independent variates. In 
the above notation this product may assume the mn mutually 
exclusive values x t y^ The probability that the product will assume 
a particular value, x^^ isp^, being the product of the probabilities 
that x will assume the value x i and y the value y^ Hence the 
expected value of the product is given by 



Performing first the summation with respect to j, and then that 
with respect to i, we have 



= E(x).E(y). (11) 



Thus the expected value of Hie product ^o/ 
equal to tlie-product of their expected values. 

If the mean of either of the independent variates, say x, is taken 
as origin for that variate, then E(x) = 0, and consequently E(xy) = 0. 
In particular this relation holds when both the variates are mea- 
sured from their means. The expected value of the product of the 
deviations of the two variates from their means is called their 
covariance. Thus thjzj&y^gmw variates i&- equal 

to zero. From this we may deduce the important theorem: 

The variance of the sum of two indep^n^er^variates is equal to the 
sum of their variances. 

JLI the two variates are measured from their means the variance 
of their sum, being the expected value of (x + j/) 2 , is given by 



and is therefore equal to the sum of the variances of x and y. The 
theorem may clearly be extended to the sum of any number of 
variates which are independent in pairs. 
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11. Repeated trials. Binomial distribution 

Suppose that a trial is repeated, so that we have a series of n 
trials. The happening of the event A as the resiilt of a trial will be 
called a success. We consider first the case in which the trials are 
independent, and the probability p of success is the same for each 
trial. Then the probability of failure is </, where p + q = 1. The 
probability that there will be exactly r successes in the series of n 
trials, and therefore n r failures, is easily found. For, by the theorem 
of compound probability, the chance of r successes and n r failures 
in a specified order is p r q n ~ r . But the number of different orders in 

tn\ 
which these successes and failures may occur is I I , being the number 

of ways of selecting r out of the n positions for the successes. Con- 
sequently, by the theorem of total probability, the chance P of 
exactly r successes in the series of trials is given by 



-o 



p r q n ~*. (12) 



Thus the probabilities of 0, 1, 2, ..., n successes, in a series of n 
trials for an event of constant probability p } arc the respective terms 
of the binomial expansion 

(q +p) n = q n 4- nq n ~*p 4- ... 4- nqp n ~ l +p n . 

The probability distribution of the number of successes thus deter- 
mined is called the binomial distribution. Thus the binomial dis- 
tribution is that in which the variate assumes the values 0, 1,2, ... 9 n 
with corresponding probabilities q n , nq n ~ l p, ..., p n . 

We may show that the expected value of the variate in the binomial 
distribution* is np, and its variance npq. In terms of repeated trials, 
the expected value of the number of successes in one trial is p, 
since the variate assumes the values 1 and with probabilities p 
and q respectively. And, since the number of successes in n trials 
is the sum of the numbers of successes in the individual trials, it 
follows from (10) that the expected value of this number is np. 
Similarly, to find the variance of the distribution we observe that, 

* Other proofs of these properties will be given in 15, Ex. 3, and 17. 
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in one trial, the square of the number of successes takes the values 
1 and with probabilities p and q respectively, and the expected 
value of x* in one trial is therefore p. Hence the variance of the 
number of successes in one trial is 



And since the number of successes in n trials is the sum of the num- 
bers in the individual trials, and these trials are independent, the 
variance of the total number of successes is the sum of the variances 
for the separate trials, and is therefore npq. The S.D. of the number 
of successes in n trials is ^(npq)\ and the S.D. of the proportion of 
successes in the series is therefore 



Example 1. Find the most probable number of successes in the above series 
of trials, that is to say, the number of successes which has a greater prob- 
ability than any other. 

The chance of r + 1 successes will be greater than that of r if 



r-f lq 
that is, if (n + l)p>r+l. 

1 fence the most probable number of successes is the integral part of (n+ 1) p. 
If, however, (n -f 1 ) p is an integer, r-f 1, the chance of r-f 1 successes is equal 
to that of r successes, and is greater than that of any other number. Thus, if 
p = 2/5, the most probable number of successes in 20 trials is the integral 
part of 21 x 2/5, which is 8. In 24 trials, however, 9 and 10 are the most 
probable numbers. 

Example 2. /i's chance of winning a game against B is 2/3. Find his 
chance of winning at least throe games out of five. 

This chance is the sum of the probabilities that A will win 3, 4, and 5 games, 
and is therefore 

__ 192 
~ 243' 

Example 3. Poi-sson's series of trials is a series of n trials in which the 

probabilities of success are p l9 p a Pa> Pn respectively. Show that the 

n 
expected value of the number of successes is X p i9 and the variance Zjp<7<. 

By the same argument as above, the expected value of the number of 
successes at the ith trial is p i9 and its variance p<#<. Hence the result. 

Example 4. Verify from (7) that a distribution, in which the variate takes 
the values 1 and with probabilities p and q respectively, has variance pq. 
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12. Continuous probability distributions 

As in the case of frequency, a continuous distribution is one in 
which the variable may take any value between certain limits, 
a and b. The number of different values is then infinite, and it is 
useless to speak of the probability of any particular value. Instead 
of this we consider the probability that the value of the variate will 
fall within a specified interval in the range of the variate. We confine 
our attention to cases in which the probability that the value of the 
variate will fall within the infinitesimal interval x \dx to x + ^dx 
is expressible in the form $(x) dx, where $ (x) is a continuous function 
of #, called the probability density or the probability function. It is, 
of course, never negative. The continuous curve y = <j)(x) is called 
the probability curve; and, when this is symmetrical, the distribu- 
tion is said to be symmetrical. The area under the curve from x a 
to x = /? represents the probability that the value of x will fall 
within the interval a to /?. The total area under the curve is unity. 
Thus 

<f>(x)dx - 1. (13) 



f 

J a 



When <p(x) is constant, the variate is said to have a uniform or 
rectangular distribution of probability. The value of the constant 
must be l/(6 a), in virtue of (13). 

The rih moment of the distribution about a particular value is 
the rth moment of the probability about that value. Since <f>(x) dx 
is the probability corresponding to the interval dx, the rth moment 
about x = is given by 



f 

J a 



(14) 



In particular, the expected value of x, being the first moment about 
x = 0, is 



E(x)=pi= f x<f>(x)fa, (15) 

J a 

and the expected value of the function fi(x) is 



= F 

J a 



(16) 
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The rth moment about the mean of the distribution is 



= F[x- 

J a 



(17) 



or 



which is the expected value of the rth power of the deviation of x 
from the mean. The variance of x, being the second moment about 

the mean, is given by 

fb 
/* 2 = [x-E(x)]*<f>(x)dx 

J a 

<r 2 = I* a*$(x)dx-[E(x)]*. (18) 

J a 

The theorems of 10 are equally true for continuous distributions. 
Let the values of x range from a to 6, and those of a second variate 
y from c to d. We may assume that the probability that x falls in the 
interval dx and, at the same time, y falls in the interval dy is jointly 
proportional to dx and dy, and expressible in the form (p(x, y) dxdy, 
where $(x, y) is a continuous function of the two variates. Then the 
expected value of the sum of the variates is 



fb fd 
=\ (x + y)$(x,y)dxdy 

J a J c 

fb rd rb rd 

= x$(x,y)dxdy+\ y$(x,y)dxdy. 

J a J c J a J c 



III the first of these two integrals let the integration with respect to 

rd 
y be performed first. Then (p(x,y)dxdy is the probability that x 

J c 

will fall in the interval dx irrespective of the value of y. Denote it 

f 6 
by<p l (x)dx. Similarly, in the second integral, <fi(x,y)dxdyis the 

Ja 

probability that y will fall in the interval dy irrespective of the value 
of x. Denote it by <f> z (y) dy. Then 



[b 
= 

J a 



(19) 
as required. 

In considering the expected value of the product xy we assume 
that the variates are independent. Then the probability that the 
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value of x falls in the interval dx is independent of the value of y, 
and may be expressed as (f> l (x)dx. Similarly, the probability that 
y falls in the interval dy is independent of the value of x 9 and is 
expressible in the form <t>z(y)dy. Therefore the probability that x 
falls in the interval dx and, at the same time, y falls in the interval 
dy is $i(x)<f> 2 (y)dxdy. Accordingly, the expected value of the 

product xy is 

fb fd 
^\ xy^(x)(l> 2 (y)dxdy 

J a J c 

* = E(x) . E(y). (20) 



= f 

J a 



Thus the expected value of the product of two independent variates 
is equal to the product of their expected values. And it follows, as 
in 10, that the covariance of two independent variates is equal 
to zero. 

Example 1, Show that, if the variable x has uniform distribution of 
probability over the range a to -fa, then $(x) l/2a, x = 0, cr 2 = ^a 2 , 
the moments of odd order about x = are zero, and that of order 2n is 
a 2n /(2tt+l). 

Example 2. Through a point B on the t/-axis, whose ordinate is positive 
and equal to a, a straight line is drawn in a direction taken at random in the 
interval 6 = JTT to = \p, being the inclination of the line to BO. Examine 
the probability distribution of the intercept x on the #-axis. 

We interpret the data as meaning that the variable has uniform distribu- 
tion of probability in the interval \n to JTT. Hence the probability that 
will fall in the interval dO is 2dO/ir. Now the intercept on the #-axis has the 
value x = atan#, so that = arctan#/a, arid therefore 



But, when 6 falls in the interval d0 9 x falls in the corresponding interval dx. 
Hence the probability density for the distribution of a; is 

^ ) = 7r(a 3 + cc3) - 

This is also the relative frequency density of the distribution in 6, Ex. 2, 
and the properties of the distributions are the same. There is symmetry 
about x = 0, and the S.D. is 0*52a. 

13. Theorems of Tchebychef and Bernoulli 

Consider first a theorem due to Tchebychef. Let x be a variate, 
either discrete or continuous, with S.D. cr. The theorem to be proved 
fixes an upper limit to the probability that a value of the variate, 
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chosen at random, will differ from the mean by more than Aer, 
where A is a given positive number. Tcheby chef's theorem may be 
stated: 

In a random choice of a value of a variate, whose standard deviation 
is cr, the probability that the value chosen will differ from the mean by 
more than ACT does not exceed I/ A 2 . 

To prove it we observe that the variance cr 2 is the second moment 
of the probability of the whole distribution about the mean. Now, 
if the combined probability P of values further than ACT from the 
mean were greater than l//\ 2 , the second moment of the probability 
of these values alone would exceed (Acr) 2 /A 2 , i.e. <r 2 , and that of the 
whole distribution would, a fortiori, be greater than cr 2 . Since this 
is not so the statement in the theorem must be true. Thus 



(21) 



In particular the probability that a value of the variate will differ 
from the mean by more than 3o* does not exceed 1/9. The theorem 
is a very conservative one, since the actual value of P is usually 
very much less than I/A 2 . The result, however, applies to all distri- 
butions; and we shall now use it to prove a theorem due to 
Bernoulli. 

Let m be the number of successes obtained in n independent 
trials, in which the constant probability of the occurrence of the 
event A is p. The quotient m/n is the relative frequency of successes. 
How does this quotient behave as n increases indefinitely? It is a 
matter of common knowledge that, in trials of this nature in which 
the value of p is known, the relative frequency obtained is usually a 
close approximation to p when n is large. But there is no proof, 
based on the measure of probability given in 7, that the relative 
frequency of successes will have p as limiting value when n tends to 
infinity. James Bernoulli, however, proved that, given any positive 
number e however small, the probability of | m/np \ exceeding 6 
tends to zero as n tends to infinity. This is expressed in modern 
terminology by saying that m/n converges in probability to p as n 
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tends to infinity. Bernoulli's theorem may also be expressed in the 
form: 

Let e and ?/ bejwo given positive numbers, however small, and let m 
beiKenumber of successes in n independent trials, in which the constant 
probability of success is p. Then the probability tltat 

~-p 
n 



<e (22) 



will hold is greater than 1 j/, provided that n is greater than a certain 
number N , depending on e and q^ 

A simple proof may be given as follows, it was shown in 1 1 that 
the mean value of the relative frequency of successes in such a 
series of trials is p, and its variance cr 2 is pq/n. If then we write 
e = ACT, the condition 

m 

>e 

is the condition that the relative frequency of successes should 
differ from its mean by more than ACT. But, by Tchebychef 's theorem, 
the probability of this does not exceed I/A 2 , so that 



e ne 

and this is less than 77 for all values of n greater than pq/ye 2 . Hence 
Bernoulli's theorem. 

The theorem does not assert that the inequality (22) must hold 
for all values of n greater than JV, but that the probability of its not- 
holding is less than ?/. Even this probability, however small, leaves 
room for the possibility that it may not hold on some particular 
occasion. Bernoulli's theorem explains the practice of taking m/n, 
for a large value of n, as an approximation to the value of p. Indeed, 
it frequently happens that this use of relative frequency is our only 
means of estimating the probability of the event. 

14. Empirical definition of probability 

As already indicated the definition of probability in terms of 
equally likely eases does not lend itself to every instance in which 
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a numerical evaluation of probability is desired. Another definition 
of the probability of an event is sometimes given in terms of the 
relative frequency of the occurrence of the event in an extended 
series of trials. The fundamental assumption for such a definition 
is that this relative frequency, in a uniform series of trials, tends to 
a definite limit as the number of trials in the series tends to infinity; 
and this limit is taken as the measure of the probability of the 
occurrence of the event in another such trial. No proof can be given 
that the above relative frequency does tend to a limit. Convergence 
in probability, mentioned in connection with Bernoulli's theorem, 
is a consequence of the original definition, and is not the same 
thing as the convergence to a limit assumed in the empirical 
definition. But, though the two approaches are not theoretically 
equivalent, they may be regarded as in agreement for practical 
purposes. 

By means of the assumption on which the empirical definition is 
based, the laws of addition and multiplication of probabilities can 
be deduced. Suppose, as in 8, that a trial may result in any one of 
the mutually exclusive events A t . In a series of n trials let m i be 
the number of times in which the event A t happens. Then the prob- 
ability p i of the happening of this event is given by 

m,. 

p t = hm . 

tt~> 00 H 

Since the number of times in which one of the events A^ A. 2 , . . ., A /t 

k 

happens is X m ^ the probability of the happening of one of these 

^ i-i 
events is 

k ra, A, ,. m, l f I 

p = hm S - = hm- ^ = ^ /, 

i-ift f^-i n i^i \ 

and we have the theorem of total probability as in 8. 

Suppose next that A and B are different events that may happen 
as the results of specified trials T and T' respectively. We require 
the probability that, in a pair of such trials, both events will happen. 
In a series of n pairs of trials let ra be the number of times in which A 
happens, and m l the number of times in which both A and B happen. 

3-2 
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These m l occasions are all included in the m occasions on which A 
happens. Then the probability of the happening of both events is 

,. m l r Im mA / .. m\ I r 
p = hm - hm . = I hm hm 

n^oo n n->oo\tt Wl/ V^ooft/ \m-*co 

since m tends to infinity with w, if the probability of A is not zero. 
Now limm/n is the probability of A, and limra^w is the con- 
ditional probability of B on the assumption that A has happened. 
We thus have the theorem of compound probability as in 8. In 
the particular case of independence the probability of the occurrence 
of the two events is simply the product of the probabilities of the 
separate events. As we have already seen, the theorems of total and 
compound probability are the foundations of the mathematical 
theory. The measure of probability by relative frequency is also 
fundamental. In the a priori definition it is the relative frequency 
of favourable cases in the total number of cases, while, in the 
empirical definition, it is the limit of the relative frequency of the 
happenings of the event under consideration. 

15. Moment generating function and characteristic function 

Let (f>(x) be the probability density in the distribution of the 
variate x. The expected value of e ix is a function of t given by 



f tt 



M(t) = e ij -(f>(x)dx, (23) 



where the integration is taken over the whole range of x. When this 
integral has a meaning for a certain range of values of t, we may 
expand the exponential and integrate term by term, thus obtaining 
the formula 

M (t) = I +//i-f/4* 2 / 2 !+/4* 3 /3 ! + (24) 

where [i' r = 

being the moment of order r about the origin (x = 0). For this 
reason the function J/(0, defined by (23), is called the moment 
generating function (m.g.f.) of the distribution about the value 
x = 0. Similarly the m.g.f. about the value x = a is defined 
as the expected value of exp[J(# a)]. Denoting this by M a (t) 
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we have then 



M a (t) = I exp [t(x - a)] $(x) dx, 



(25) 



provided the integral has a meaning; and the expansion of the 
exponential, and term by term integration, then show that the 
coefficient of t r /rl is the moment of order r about the value x = a. 
Since the factor e~ ai is independent of x, it follows from (25) that 

M a (t) = e~<M Q (t), (20) 

the subscript indicating the value with respect to which the in.g.f. 
is constructed. 

When the variate x takes only the discrete values x t with prob- 
abilities PI (i = 1,2, ...,n), the m.g.f. with respect to the value 
x a, being the expected value of exp [t(x a)], is given by 

(*,- a)] = e-M Q (t) (27) 



as before; and the coefficient oft r /rl in the expansion is the moment 
of order r about the value in question. The m.g.f. will be found useful, 
not only in calculating the moments of a distribution, but also in 
leading to concise proofs of various theorems. Along with it may 
be mentioned the characteristic function* (c.f.), E(e i{x ) ) which is the 
expected value of e Ux 9 where i = <J 1 and t is real. 

Example 1. For the exponential distribution defined by $(x) = ce~ ex t in 
which c is positive and x varies from to oo, the m.g.f. with respect to the 
origin is 



/OO 

= c\ oxp (tx ex) dx (\t\<c), 
J 



M(t) 



/ 1 ,2! 
showing that a-. = - , /*., = 

c " 

Thus the mean is 1/c, and the variance is given l)y 



and the S.D. by <7 = 1/c. 

* Cf. LeVy, 1925, 3, part n, chapters n and in; CrameY, 1937, 6, 
pp. 23-68; and Kendall, 1943, 2, pp. 90-100. 
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Example 2. For the rectangular distribution, <j>(x) =s l/2a, - 

we have 

e a*_ e -a* 






The moments of odd order are zero, and /i 2r = a 2r /(2r+ 1). 

Example 3. For the binomial distribution the m.g.f. with respect to the 
origin is, in virtue of (27), 

j2 n ~V + -- 



M(t) = g w 



The mean, being tlio coefficient of in the expansion, is np. Similarly /4 
being the coeflicient of t 2 /2 !, has the value 



Consequently the variance is given by 

/* 2 = I*'* - (X) 2 = np(l-p) = 

A very important property of the m.g.f. is expressed in the 
theorem: 



Thejtnoment generatm^^mtion__qf^the.. mn(LJ)-.two__ independent 
varjMe&i^tJie^pLQjdd^ 

This is a direct consequence of the theorem concerning the 
expected value of the product of two independent variates. For, if 
x and y are independent variates. the m.g.f. of their sum with respect 
to the origin is 



and is therefore the product of their m.g.f.'s. And, since the origin 
may be chosen at pleasure, the theorem holds for the m.g.f.'s about 
any specified value. 

Let /4 r , /4 denote the moments ofx about the mean and the origin. 
and m r , m' r those of ?/. Then, by the above theorem, the m.g.f. of 
x + y has an expansion obtained from the product 



The coefficient of t in this product is /i[ -f m{, so that the mean of the 
sum of the variates is the sum of their means. From the coefficient 
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of t 2 /2 ! we find that the second moment of x -f y about the origin is 
/ii + Z/i^m'i + m'i. Consequently the second moment of x + y about 
its mean is 



' 2 - m?) = fi 2 + m 2 , 

and is thus equal to the sum of the variances of x and y. Since the 
variance of y is equal to that of y, we have again the theorem : 

The variance oJJJie^sum (or differencelof tM&indzpgnde.nt, variola 
is equal toJlie sum of their variances. 

From their definlftonlTitfis clear that the moment generating 
function, when it exists, and the characteristic function, which 
always exists, are determined by the probability density <j>(x) 
of the distribution. Conversely, the distribution is determined 
uniquely by its characteristic function,* or by the moment gene- 
rating function when the moments satisfy certain conditions ;f 
that is to say, variates which have the same moment generating 
function conform to the same distribution. Proofs of these state- 
ments are beyond the scope of this book. In the three cases in 
which we shall make use of this converse theorem (pp. 49, 57-8 
and 151) the moments of the distributions satisfy the necessary 
conditions. 

16. Cumulative function of a distribution 

If the logarithm of the m.g.f. of a distribution can be expanded 
as a convergent series in powers of t, viz. 



..., (28) 

the coefficients, K r , are called the cumulants (or seminvariants^.) of 
the distribution, and K (t) is the cumulative function. The cumulants 

* For a proof of this converse theorem see Le*vy, 1925, 3, pp. 166-7; 
also Deltheil, Erreurs et Moindres Carres, pp. 26-9 (Fasc. 2, Tome 1 of 
Traite du Calcul des Probabilitds et dc ses Applications, ed. E. Borel), Gauthier- 
Villars, Paris, 1930, and Kendall, 1943, 2, pp. 90-4. 

t Cf. Kendall, 1943, 2, pp. 105-10. 

J We adopt Fisher's terminology of cumulants (1929, 1) rather than 
Thiele's of seminvariants (1903, 1 ), since Dressel has shown that the cumulants 
are only one particular set of seminvariants (Ann. Math, Stat. vol. xi, pp. 
33-57, 1940). 
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are determinate functions of the moments. For instance, on taking 
logarithms of both members of (24) and identifying with (28), we 
see that K I = /&{, and is thus equal to the mean of the distribution. 
Since, by (26), the m.g.f. with respect to the mean is 



it is clear on taking logarithms that the cumulative function with 
respect to the mean differs from that with respect to x = only 
by the addition of the term ^t. Consequently the cumulative 
function relative to the mean is 



.. (29) 

And since this must be identical with 



we see, on comparing coefficients of like powers of t, that the first 
few cumulants are given in terms of the moments of the distribution 
by the formulae 

K I fi[ = mean, 

K 2~/*2, ^3 = ^ Kt=Pn~tyl. (30) 

Thus all the cumulants after the first are independent of the value 
with respect to which the cumulative function is constructed. The 
mean, and the moments about the mean, may therefore be found by 
calculating the cumulative function with respect to any convenient 
origin. Also, from the last of the relations (30), we have 



and this is the excess of kurtosis, as defined at the close of 6. 

Example. Since by 15, Ex. 1, the m.g.f. of the exponential distribution, 
with respect to the origin, is 

M Q (t) = (!-<*)-*, 
the cumulative function is 

K(t) = - log ( 1 - art) = 
Thus K r = 

The mean, being the coefficient of t, is cr. The variance is a- 2 , ji 9 = 2<r 8 , and 

d! =? (3! + 3)cr 4 = 9(7*. 
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Further, since the m.g.f. of the sum of two independent variates, 
x and y, is equal to the product of their m.g.f.'s, it follows that the 
cumulative function of the distribution of x -f y is the sum of those 
of x and y. Equating coefficients of like powers of t, we have the 
simple result that the rth cumulant of x -f y is the sum of the rth 
cumulantsof #andy. This is the additive, property of cumulants. The 
theorem is obviously true for the sum of any finite number of inde- 
pendent variates. The particular case of second cumulants gives again 
the theorem on the variance of the sum of several independent variates. 

The above property of cumulants may be used to estimate the 
average corrections* to be applied to the moments of a grouped 
distribution, with specified class interval c, when the interval-mesh 
is located at random on the ungrouped distribution. The corrections 
found, which are of the same form as Sheppard's adjustments, may 
be wrong in any individual instance, but their average effect in a 
large number of cases will be correct. In any class the mid-value x if 
from which the moments of the grouped distribution are calculated, 
is the sum of the true value x of the observation and the grouping 
error x i x. In consequence of the random location of the class 
limits, the grouping error is uniformly distributed over the range 
\c to |c. The average cumulants of the grouped distribution 
will differ from those of the ungrouped by the cumulants of the 
grouping error. Now the m.g.f. for the uniform distribution of 
error is 



_. i _! 

and its cumulative function is therefore 

K ~ 122]" 1204! + 2526! + '" 

If then K f r are the cumulants of the grouped distribution, those of 
the ungrouped distribution are 

r 2 r 4 
/ /^ _ ' ' JL t 

* Cf. Cornish and Fisher, 1937 t 7, pp. 3-4, and Kendall, 1943, 2, pp. 74-6. 
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Denoting the moments of the grouped distribution by ra r and m' r , 
we therefore have for those of the ungrouped distribution 



c 2 



, /* 3 = ra 3 (32) 



and // 4 = 

7c 4 



The equation (32) shows that the estimate m{ of the mean, and the 
estimate w 3 of the third moment about the mean, as found from the 
grouped distribution, are sufficiently accurate. The calculated 
variance w 2 should be diminished by c 2 /12; while the adjustment to 
the fourth moment w 4 is given by (33). 
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EXAMPLES II 

1 . Two cards are drawn at random from a well-shuffled pack of 
52. Show that the chance of drawing two aces is 1/221. 

2. The chance of throwing a 6 at least once in two throws of a 
die is 11/36. 

3. A and B toss a coin alternately on the understanding that 
the first to obtain heads wins the toss. Show that their respective 
chances of winning are 2/3 and 1/3. 

4. Four persons are chosen at random from a group containing 
3 men, 2 women and 4 children. The chance that exactly two of 
them will be children is 10/21. 
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5. From an urn containing r red and 8 white balls, a + 6 balls are 

drawn at random without replacement (a ^ r, b ^ 5). Show that the 

( r \ / s \ / r + 8 \ 
I (,1-M .1. 

a]\b) \a + bj 

6. Show that, in a single throw with two dice, the chance of 
throwing more than 7 is equal to that of throwing less than 7, 
each being 5/12. 

7. A and B take turns in throwing two dice, the first to throw 9 
being awarded the prize. Show that their chances of winning are 
in the ratio 9:8. 

8. Three men toss in succession for a prize to be given to the one 
who first obtains heads. Show that their chances of winning are 

4/7, 2/7 and 1/7. 

9. Eight coins are thrown simultaneously. Show that the chance 
of obtaining at least six heads is 37/256. 

10. The expectation of the number of failures preceding the first 
success in an indefinite series of independent trials, with constant 
probability p of success, is 



(Uspensky, 1937, 2, p. 178, Ex. 3.) 

11. A point P is taken at random in a line AB, of length 2a, all 
positions of the point being equally likely. Show that the expected 
value of the area of the rectangle AP . PB is 2a 2 /3, and that the 
probability of the area exceeding |a 2 is 1/^/2. 

12. From a point on the circumference of a circle of radius a, 
a chord is drawn in a random direction (i.e. all directions are equally 
likely). Show that the expected value of the length of the chord 
is 4a/7r, and that the variance of the length is 2a 2 (l 8/Tr 2 ). Also 
show that the chance is 1/3 that the length of the chord will exceed 
the length of the side of an equilateral triangle inscribed in the 
circle. 

13. A chord of a circle of radius a is drawn parallel to a given 
straight line, all distances from the centre of the circle being equally 
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likely. Show that the expected value of the length of the chord is 

a 2 
\na t and that the variance of the length is (32 - 37r 2 ). Also show 

\.JL 

that the chance is 1/2 that the length of the chord will exceed the 
length of the side of an equilateral triangle inscribed in the circle. 

14. Two different digits are chosen at random from the set 
1, 2, 3, ..., 8. Show that the probability that the sum of the digits 
will be equal to 5 is the same as the probability that their sum will 
exceed 13, each being 1/14. Also show that the chance of both digits 
exceeding 5 is 3/28. 

15. In Poisson's distribution the variate takes the values 
0, 1, 2, 3, ... with probabilities proportional to 1, ra, ra 2 /2 !, w 3 /3 !,..., 
both sequences being infinite. Show that the mean of the distribu- 
tion is ra, and the variance also m. 

16. The equation 15 (26), written fora = x = fi\, is equivalent to 



Equating coefficients of like powers of t, deduce the relations 
4(17), (18). 

17. Prove that the next two formulae corresponding to 16 (30) 
are 

/c 5 = fa - 10// 8 /* 2 , * 6 = /i 6 - 15/* 4 // 2 - 10/if + 30/4 

18. Two independent variates are each uniformly distributed 
within the range a to a. Show that their sum x has a probability 
density given by 

$(x) = (2a + z)/4a 2 ( - 2a ^ x ^ 0), 
$(x) = (2a - z)/4a 2 (0 ^ x ^ 2a). 



Verify that the m.g.f., calculated from this value of <f>(x), is equal 

to lsinhan . 
\a* / 

19. On the s-axis n-f 1 points are taken independently between 
the origin and x 1, all positions being equally likely. Show that 
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the probability that the (k+ l)th of these points, counted from the 
origin, lies in the interval x \dx to x n- \dx is 



Verify that the integral of this expression, from x = to x = 1, is 
unity. (Aitken, 1939, 1, p. 71.) 

20. Show that, for the binomial distribution, 



21 . Show that the expected value of the product of the numbers 
of points showing after an unbiased throw of n ordinary dice is (7/2) n . 

22. Show that, if p may be varied, the probability of m 
successes in a series of n independent trials, with the same prob- 
ability p of success, is greatest when p = mjn. 

23. Show that, if y and z are independent random values of a 
variate cc, the expected value of (y z) 2 is twice the variance of the 
distribution of x. 

24. Defining the harmonic mean (H.M.) of a variate x as the 
reciprocal of the expected value of I/or, show that the H.M. of the 
variate which ranges from to oo with probability density x n e~ x /n I 
is ft, given that n is positive. 



CHAPTER III 
SOME STANDARD DISTRIBUTIONS 

BINOMIAL AND POISSONIAN DISTRIBUTIONS 

17. The binomial distribution 

We have considered the binomial distribution in connection 
with the probabilities of the various numbers of successes in a 
series of n independent trials, in each of which the chance of success 
is equal to p. The mean and the variance of the distribution were 
determined by means of the property that the expected value of a 
sum of variates is equal to the sum of their expected values. These 
may also be found by direct calculation. Thus 



E(x) = . ? n + 1 . vq n - l p + 2 . j qn~2p* + . . . + np n 

\^I 

= np \q n ~ l + (n~~ l)q n ~*p + 

= np(q+p) n -* = np. (1) 

Thus the mean of the distribution is np. In order to find the variance 
calculate first the second moment about x = 0. Thus 

/4 = O 2 . q n + I 2 . nq n ~ l p + 2 2 . | n \ q n ~*p 2 +...+n*.p n 

= np[ q n ~ l + 2(?t~ l)q n ~*p + 3 ( ^ ) g n ~V+ H-np 71 " 1 j . 

Now the expression in brackets is the first moment, about the value 
x = 1 , for the binomial distribution in which n is replaced by 
n I. This first moment, being the excess of the mean of the dis- 
tribution above x --- 1, is equal to (nl)p+l, in virtue of (1). 
Consequently 



and the variance of the binomial distribution is then given by 

(2) 
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and the standard deviation by 

<r = <J(npq). (3) 

The proportion or relative frequency of successes is the number of 
successes divided by n. Hence the mean value of this proportion 
is p, and its standard deviation is ^J(pqln), 
A binomial frequency distributij^j^o^ 

As an example we may 



take the distribution of expected frequencies of 0, 1, ..., n successes 
when the set of n trials is to be made N times. For, in virtue of (1) 
and the known probability of r successes, the expected frequency of 

r successes in N sets of n trials each is N | I p r q n ~ r . The properties 

\rj 

of the distribution are the same whether it is regarded as one of 
probability or one of frequency. But the reader may note that, in 
a theoretical frequency distribution like the above, the individual 
frequencies are not necessarily integral. 

Example 1. Verify the above value of //^ by direct algebraical simpli- 
fication of the expression. 

Example 2. Show that the m.g.f . of the binomial distribution with respect 
to its mean is 



r t* t* t* I" 

= I 1 + M ~ 2[ + pq(q* - P 2 ) ^+pq(q^P 3 ) + , 



and deduce that 

p z = wptf, // 3 = npq(q -p), fa = npq[l + 3(n - 

18. Poisson's distribution 

An important distribution, associated with the name of Poisson, 
is one obtainable from the binomial distribution by putting p = m/n, 
where m is a constant,and letting n increase indefinitely. Thus the 
number of trials in the series becomes very large, and the prob- 
ability of success in a trial very small. Now it can be shown* that, 
on the above assumption as n tends to infinity, 

i- l n \ * * m r e~ m 
hm p r q n ~ r = ~ - . 
\r/ 2 rl 

* See Mathematical Note I, at the end of this chapter. 
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Thus, in the limiting form of the distribution, the probability of r 
successes in the infinite series of trials is w r e~ m /r!. The chances of 
0, 1, 2, 3, ... successes in the infinite series are 

e~, e~m, e~ m m*/2l, e-m*/3l, ... (4) 

respectively. The reader may show, as in the case of the binomial 
distribution, that the most probable number of successes is the 
integral part of m. It is obvious that the sum of the probabilities 
of 0, 1, 2, ... successes is e~ m e m = 1, as it should be. 

The mean value and the variance of Poisson's distribution may be 
deduced from those of the binomial by putting p = m/n, and 
letting n tend to infinity. Thus the mean value is lim up = m. 
Similarly, the variance is lining = lim my = m, since q tends to 
unity as p tends to zero. These results may, of course, be deduced 
by direct calculation. Thus the mean, being the first moment about 
x =s 0, is given by 



- m. (5) 

Similarly 



as the reader may easily verify. Consequently 

0-2 _ ^ a = ft'^-jp = m(w-f 1) - w 2 

= , (6) 

as found above. The S.D. is therefore *Jm. 

The same results may be obtained by using the generating 
functions of 15 and 16. Thus the m.g.f. of the Poissonian dis- 
tribution with respect to the origin is 



= e~ w exp (me 1 ) = exp [m(e l - 1)], 
and the cumulative function is therefore 
K(t) m(e!-l) 
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Thus K I Ac 2 as K 3 s ... a m. The mean and the variance are each 
equal to m, as is also the third moment about the mean. The fourth 
moment about the mean is 

^ 4 = /c 4 + 3*1 = m + 3m 3 . 

Further, if n independent variates x i conform to Poissonian 
distributions with means m i (i = 1,2, ...,n), it follows from the 

theorem of 15 that the m.g.f. of their sum is exp (e* 1) H w^ . 

But this is the m.g.f. of a Poissonian distribution whose mean is 

2#V Hence the theorem : 
* 

The sum of any finite number of independent Poissonian variates 
is itself a Poissonian variate, with mean equal to the sum of the means 
of the separate variates* 

A Poissonian frequency distribution is one in which the relative 
frequencies of the values 0, 1, 2, 3, ... of the variate are equal to the 
probabilities in the above distribution. As an example we have the 
distribution of the expected frequencies of the various numbers of 
successes when the extensive series of trials is repeated N times. 
But in such a theoretical distribution the individual frequencies 
are not necessarily integral. Frequency distributions which are 
approximately Poissonian do arise in connection with the number 
of happenings of a rare event in an extensive series of trials. 

Example. In 1 ,000 consecutive issues of The Utopian Seven-daily Chronicle 
the deaths of centenarians were recorded,! the number x having frequency 
/ according to the table 

xi 1 2 345678 
/: 229 325 257 119 50 17 2 1 

Show that the distribution is roughly Poissonian by calculating its mean 
(m = 1*5), and then the frequencies in the Poissonian distribution with the 
same mean and the same total frequency of 1,000. The latter are approxi- 
mately 223-1, 334-7. 251-0, 125-5, 47-1, 14-1, 3-5, 0-8, 0-2. Also calculate the 
variance of the given distribution, and compare it with the mean. 

* An elementary proof of this theorem will be found in Ex. 4 at the end 
of this chapter. 

t Cf, Lucy Whittaker, Biometrika, vol. x, 1914, p. 36. 
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THE NORMAL DISTRIBUTION 

19. Derivation from the binomial distribution 

The binomial and Poissonian are distributions of discrete values. 
We pass now to a continuous distribution of fundamental import- 
ance. This is the normal distribution, which may be derived from the 
binomial in the following manner. In the latter the probability of 

/n\ 
the value r of the variate is I I p r q n ~ r . Let x be the deviation of 

W 

the variate from the mean value np of r, so that 

r = np + x. (7) 

Then the probability of the value x, being the same as that of the 
corresponding value of r, is 



= \} 
\rj 



It can be shown* that, for large values of n, this probability can be 
expressed 



where e tends to zero as n tends to infinity, provided that neither 
p nor q is very small, and x is of lower order than n 213 . 
Now introduce a variable z defined by 






and examine the probability distribution of z, and its limiting form 
as n tends to infinity. The probability for any interval in the range 
of z is equal to that of the corresponding interval for x. To unit 
interval in the range of x there corresponds the interval l/<Jn in 
that of 2; and, when n tends to infinity, this may be denoted by dz. 
Then the probability that z falls in the interval dz is the probability 
that x falls in unit interval which includes the value x\ and this is 
given by (8) which, when n tends to infinity, takes the form 

dz f z 2 



* For the details of this step see Mathematical Note II, at the end of this 
chapter. 
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The limiting form of the distribution of z is thus a continuous dis- 
tribution with probability density 



where cr 2 = pq. This is the normal probability function, and the 
corresponding continuous distribution is the normal distribution. 
Since J(npq) is the S.D. of the binomial distribution, and therefore 
of the variate x, it follows that *J(pq) is the S.D. of z. Hence <r in (9) 
is the S.D. of the normal distribution. Since (p(z) is an even function 
of z, the distribution is symmetrical about z 0, which is therefore 
the mean value of the distribution. Also, since z must lie between 
oo and oo, the integral of (9) between these limits is equal to 
unity, so that 

(10) 



This is an important integral. An independent proof of the formula 
is given in Mathematical Note III, at the end of this chapter. 

A normal frequency distribution is a continuous one, in which 
the relative frequency density f(x) is identical with the function <j)(x) 
defined by (9). A distribution of discrete values cannot be normal. 
If, however, the total frequency is large, it is possible for the discrete 
distribution to approximate to the normal. The meaning of such an 
approximation was explained in 6. 

In later chapters, particularly in connection with sampling theory, 
we shall meet distributions which are accurately normal, and others 
which are approximately so. Since the normal distribution is an 
ideal to which some distributions attain, and to which others 
approximate, it is important to know its properties. Fortunately, 
the quantities associated with the distribution are easy to calculate. 

20. Some properties of the normal distribution 

As we have just seen, a continuous variate x is normally dis- 
tributed, with mean zero and S.D. cr, when the range of the variate 
is from oo to oo, and the probability density is given by 



4-2 
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The probability curve is therefore the curve 

1 



y 



exp 



I--} 

\ *>*)' 



(m 



(12) 



This curve is symmetrical about the line x = through the mean of 
the distribution. It is a uni-modal curve, in which the ordinates 
decrease rapidly as | x \ increases. By equating to zero the second 
derivative of y, the reader will easily verify that the points of in- 
flexion on the curve are given by x = or. The ordinates of the 
normal curve are given in Table 1, corresponding to values of x\a 
at intervals of O'OL 




Fio. 3. The Normal Probability Curve 

Since the distribution is symmetrical, the moments of odd order- 
about the mean are all zero. To find the moments of even order we 
observe that the moment of order 2n about the mean is 



Integration by parts then gives 



The expression in square brackets vanishes at both limits; and we 
may therefore write the relation 



Normal Distribution 
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TABLE 1. Ordinates of the Normal Curve 

1 / a?* \ 

The origin of a; is at the mean. The table gives the values of -JT:^ exp f ^ 2 ) * 

which is cr times the ordinate of the normal curve y = (j>(x) 



X 
CT 


0-00 


0-01 


0-02 


0-03 


0-04 


0-05 


0-06 


0-07 


0-08 


0-09 


0-0 


3989 


3989 


3989 


3988 


3986 


3984 


3982 


3980 


3977 


3973 


0-1 


3970 


3965 


3961 


3956 


3951 


3945 


3939 


3932 


3925 


3918 


0-2 


3910 


3902 


3894 


3885 


3876 


38Z- 


3857 


3847 


3836 


3825 


0-3 


3814 


3802 


3790 


3778 


3765 


5752 


3739 


3725 


3712 


3697 


0-4 


3683 


3668 


3653 


3637 


3621 


3605 


3589 


3572 


3555 


3538 


0-5 


3521 


3503 


3485 


3467 


3448 


3429 


3410 


3391 


3372 


3352 


0-6 


3332 


3312 


3292 


3271 


3251 


3230 


3209 


3187 


3166 


3144 


0-7 


3123 


3101 


3079 


3056 


3034 


3011 


2989 


2966 


2943 


2920 


0-8 


2897 


2874 


2850 


2827 


2803 


2780 


2756 


2732 


2709 


2685 


0-9 


2661 


2637 


2613 


2589 


2565 


2541 


2516 


2492 


2468 


2444 


1-0 


2420 


2396 


2371 


2347 


2323 


2299 


2275 


2251 


2227 


2203 


1-1 


2179 


2155 


2131 


2107 


2083 


2059 


2036 


2012 


1989 


1965 


1-2 


1942 


1919 


1895 


1872 


1849 


1826 


1804 


1781 


1758 


1736 


1-3 


1714 


1691 


1669 


1647 


1626 


1604 


1582 


1561 


1539 


1518 


1-4 


1497 


1476 


1456 


1435 


1415 


1394 


1374 


1354 


1334 


1315 


1-5 


1295 


1276 


1257 


1238 


1219 


1200 


1182 


1163 


1145 


1127 


1-6 


1109 


1092 


1074 


1057 


1040 


1023 


1006 


0989 


0973 


0957 


1-7 


0940 


0925 


0909 


0893 


0878 


0803 


0848 


0833 


0818 


0804 


1-8 


0790 


0775 


0761 


0748 


0734 


0721 


0707 


0694 


0681 


0669 


1-9 


0656 


0644 


0632 


0620 


0608 


0596 


0584 


0573 


0562 


0551 


2-0 


0540 


0529 


0519 


0508 


0498 


0488 


0478 


0468 


0459 


0449 


2-1 


0440 


0431 


0422 


0413 


0404 


0395 


0387 


0379 


0371 


0363 


2-2 


0355 


0347 


0339 


0332 


0325 


0317 


0310 


0303 


0297 


0290 


2-3 


0283 


0277 


0270 


0264 


0258 


0252 


0246 


0241 


0235 


0229 


2-4 


0224 


0219 


0213 


0208 


0203 


0198 


0194 


0189 


0184 


0180 


2-5 


0175 


0171 


0167 


0163 


0158 


0154 


0151 


0147 


0143 


0139 


2-6 


0136 


0132 


0129 


0126 


0122 


0119 


0116 


0113 


0110 


0107 


2-7 


0104 


0101 


0099 


0096 


0093 


0091 


0088 


0086 


0084 


0081 


2-8 


0079 


0077 


0075 


0073 


0071 


0069 


0067 


0065 


0063 


0061 


2-9 


0060 


0058 


0056 


0055 


0053 


0051 


0050 


0048 


0047 


0046 


3-0 


0044 


0043 


0042 


0040 


0039 


0038 


0037 


0036 


0035 


0034 


3-1 


0033 


0032 


0031 


0030 


0029 


0028 


0027 


0026 


0025 


0025 


3-2 


0024 


0023 


0022 


0022 


0021 


0020 


0020 


0019 


0018 


0018 


3-3 


0017 


0017 


0016 


0016 


0015 


0015 


0014 


0014 


0013 


0013 


3-4 


0012 


0012 


0012 


0011 


0011 


0010 


0010 


0010 


0009 


0009 


3-5 


0009 


0008 


0008 


0008 


0008 


0007 


0007 


0007 


0007 


0006 


3-6 


0006 


0006 


0006 


0005 


0005 


0005 


0005 


0005 


0005 


0004 


3-7 


0004 


0004 


0004 


0004 


0004 


0004 


0003 


0003 


0003 


0003 


3-8 


0003 


0003 


0003 


0003 


0003 


0002 


0002 


0002 


0002 


0002 


3-9 


0002 


0002 


0002 


0002 


0002 


0002 


0002 


0002 


0001 


0001 
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Repeated application of this reduction formula shows that 

/*2n = (2n- 1) (2n- 3) ...3.1 .(T 2 > . 
But, for any distribution, /* = 1, and therefore 

-3)...3.1.(T 2 . (13) 



In particular, the variance is or 2 and the S.D. is cr, as proved above. 
Similarly, 

/* 4 = So 4 . (14) 

The moments may also be obtained from the moment generating 
function of the normal distribution. This function, relative to the 
mean x = 0, is 



*>*- 



J 2 ), (15) 

in virtue of (10). Since the expansion of this expression involves 
only even powers of t, the moments of odd order about the mean are 
zero, as is obvious from symmetry. The moment of order 2n is the 
coefficient of t* n /(2n)\ in the expansion; and this has the value 

/t lfl = (jcr i )(2n)!/n!= s 1 .3.5... (2- l)<r 2 " 
as found above. The cumulative function is, in virtue of (15), 



so that all the cumulants after the second are equal to zero. 
The mean deviation from the mean is easily calculated. Its value is 



/oo /*oo 

\x\$(x)dx=* 2 x$(x)dx 

J-oo JO 



in virtue of symmetry; and this integral 

V 2 

" 



* 2 \,7 "V 2 r / * a \T 

- \dx = -- f- exp --J 
2cr 2 / > L V 2cr 2 /J 

72 
- = 0*7979cr = |cr approximately. 
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In a normal distribution with mean x and S.D. cr, x x is the 
deviation of the variable from the mean, and the probability den- 
sity is 



Conversely, a probability density of this form defines a normal 
distribution with mean x and S.D. cr. 

21. Probabilities and relative frequencies for various intervals 

The probability corresponding to any interval in the range of 
the variate is represented by the area under the curve (12) within 
that interval; and the same is true for the relative frequency in the 
case of a normal frequency distribution. In particular, the prob- 
ability for the interval from the mean (zero) to the value x is given 
by the integral 

D 1 f x i x *\j 

P = - -r-r exp -^ dx. 

r \ 2cr 2 / 



Putting t = x/cr we see that this is equivalent to 

(17) 



This probability is therefore a function of a?/cr. In the accom- 
panying table the values of this integral are given for different 
values of x/cr at intervals of 0-01. 

Using Table 2 the reader will see that, if x/cr = 1, the area is 
0*3413. The area within the interval x = <r to a; = <r is therefore 
0-6826, and the area outside this interval 0-3174, which is less than 
1/3. In other words, the probability that a random value of a normal 
variate will deviate more than a* from the mean, is less than 1/3. 
Similarly, for x/cr = 2 the value of the integral (17) is 0-4772, and 
the area for the interval x = 2<r to x = 2<r is 0-9544. The area 
outside this interval is thus 0-0456. The probability that a random 
value of x will deviate more than 2cr from the mean is thus about 
4J %. Similarly, the area outside the range x = 2-50* to x = 2-5(T 
is 0-0124, which is about 1 J % of the whole; and that outside the 
range x = 3cr to x = 3<r is only 0-0027, or about J % of the whole. 
The reader may verify, and will find it convenient to remember, 
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that the deviation from the mean which is exceeded with a prob- 
ability of 6% is 1-960-; and that which is exceeded with a 
probability of 1 % is 2-58cr. 

TABLE 2. Area under the Normal Ourve 

The area is measured from the mean, x = 0, to any ordinate, a? #. 
The results are given for values of x/cr at intervals of 0-01 



X 

& 


0-00 


0-01 


0-02 


0-03 


0-04 


0-05 


0-06 


0-07 


0-08 


0-09 


0*0 


0000 


0040 


0080 


0120 


0159 


0199 


0239 


0279 


0319 


0359 


0-1 


0398 


0438 


0478 


0517 


0557 


0596 


0636 


0675 


0714 


0753 


0-2 


0793 


0832 


0871 


0910 


0948 


0987 


1026 


1064 


1103 


1141 


0-3 


1179 


-1217 


1255 


1293 


1331 


1368 


1406 


1443 


1480 


1517 


0-4 


1554 


1591 


1628 


1664 


1700 


1736 


1772 


1808 


1844 


1879 


0-5 


1915 


1950 


1985 


2019 


-2054 


2088 


2123 


2157 


2190 


2224 


0-6 


2257 


2291 


2324 


2357 


2389 


2422 


2454 


2486 


2518 


2549 


0-7 


2580 


2611 


2642 


2673 


2704 


2734 


2764 


2794 


2823 


2852 


0-8 


2881 


2910 


2939 


2967 


2995 


3023 


3051 


3078 


3106 


3133 


0-9 


3159 


3186 


3212 


3238 


3264 


3289 


3315 


3340 


3365 


3389 


1-0 


3413 


3438 


3461 


3485 


3508 


3531 


3554 


3577 


3599 


3621 


M 


3643 


3665 


3686 


3708 


3729 


3749 


3770 


3790 


3810 


3830 


1-2 


3849 


3869 


3888 


3907 


3925 


3944 


3962 


3980 


3997 


4015 


1-3 


4032 


4049 


4066 


4082 


409$ 


4115 


4131 


4147 


4162 


4177 


1-4 


4192 


-4207 


4222 


4236 


4251 


4265 


4279 


4292 


4306 


4319 


1-5 


4332 


4345 


4357 


4370 


4382 


4394 


4406 


4418 


4430 


4441 


1-6 


4452 


4463 


4474 


4485 


4495 


4505 


4515 


4525 


4535 


4545 


1-7 


4554 


4564 


4573 


4582 


4591 


4599 


1608 


4616 


4625 


4633 


1-8 


4641 


4649 


4656 


4664 


4671 


4678 


4686 


4693 


4699 


4706 


1-9 


4713 


4719 


4726 


4732 


4738 


4744 


4750^ 


4756 


4762 


4767 


2-0 


4772 


4778 


4783 


4788 


4793 


4798 


4803 


4808 


4812 


4817 


2-1 


4821 


4826 


4830 


4834 


4838 


4842 


4846 


4850 


4854 


4857 


2-2 


4861 


4865 


4868 


4871 


4875 


4878 


4881 


4884 


4887 


4890 


2-3 


4893 


4896 


4898 


4901 


4904 


4906 


4909 


4911 


4913 


4916 


2-4 


4918 


4920 


4922 


4925 


4927 


4929 


4931 


4932 


4934 


4936 


2-5 


4938 


4940 


4941 


4943 


4945 


4946 


4948 


4949 


4951 


4952 


2-6 


4953 


4955 


4956 


4957 


4959 


4960 


4961 


4962 


4963 


4964 


2-7 


4965 


4966 


4967 


4968 


4969 


4970 


4971 


4972 


4973 


4974 


2-8 


4974 


4975 


4976 


4977 


4977 


4978 


4979 


4980 


4980 


4981 


2-9 


4981 


4982 


4983 


4983 


4984 


4984 


4985 


4985 


4986 


4986 


3-0 


49865 


4987 


4987 


4988 


4988 


4989 


4989 


4989 


4990 


4990 


3-1 


49903 


4991 


4991 


4991 


4992 


4992 


4992 


4992 


4993 


4993 



Prom Table 2 we may find the quartiles of the normal distribu- 
tion with mean at x = 0. The relative frequency from the mean to 
the upper quartile is 0*25, which is therefore the corresponding area 
under the curve. Interpolating with the aid of Table 2 we find 
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x/cr =3 0-6745, and the upper quartile is therefore 0*6745cr. Simi- 
larly, the lower quartile is 0*6745cr. 

Example 1. Using Table 2 show that the 6th, 7th, 8th and 9th deciles 
are 0-253cr, 0-525(7, 0-842<r and l-282cr respectively. 

Example 2. For a normal distribution, with mean x = 1 and S.D. 3, find 
the probabilities for the intervals 

(i) x = 3-43 to x = 6-19, (ii) x = - 1-43 to x = 6-19. 

Let x' denote the deviation from the mean. Then, in the first part of the 
example, the values of x'jcr for the bounds of the interval are 0-81 and 1-73. 
The areas from the mean to the bounds of the interval are 0-2910 and 0-4582. 
The area corresponding to the interval is the difference of these areas, and 
is therefore 0-1672. 

In the second part the values of aj'/cr corresponding to the bounds of the 
interval are 0-81 and 1-73. The area from the lower bound to the mean is 
0-2910, and that from the mean to the upper bound is 0*4582 as before. 
The required area is the sum of these, viz. 0-7492. 

22. Distribution of a sum of independent normal variates 

Consider the normal variate x, with mean a and variance cr 2 . 
Its m.g.f. with respect to the origin is 



(18) 



in virtue of (10). The cumulative function is 



If c is a constant, ex is normally distributed with mean ca and 
variance cV 2 . Hence the m.g.f. of the distribution of ex is 



Let Xf (i = 1, ...,n) be n independent normal variates, with means 
a { and variances erf. Then, by the theorem of 15, the m.g.f. of the 
sum ZCfXf is 
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But this is the m.g.f. of a normal distribution with mean fyfy and 
variance cfcrf. Hence the theorem: 

// the independent varieties x i (i = 1, . . ., n) are normally distributed 
with means a i and variances o*f, the variate S^a^ is normally dis- 
tributed with mean ^c t a t and variance Scfcrf. 

In particular, the sum (or the difference) of two independent 
normal variates is normally distributed, with variance equal to 
the sum of their variances. Also, if in the above theorem we put 

at = a, 0-1 = 0-, c^lln (i - 1,2, ...,n), 
\ve have the important result: 

// the independent variates x i (i = 1, . . ., n) are normally distributed 
about a common mean, a, with a common variance, o* 2 , their mean is 
also normally distributed about a, but with variance cr 2 /n. 

Other proofs of the above theorem will be found in Ex. 8 at the end 
of this chapter. ' 
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EXAMPLES III 

1. Show by direct calculation that the third moment of the 
binomial distribution about x = is 



and deduce that /J 8 = npq(q-p). Similarly, show that 
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2. Show that, for Poisson's distribution, the third and fourth 
moments about x = are given by 

^3 = m[(m + l) 2 + m], /4 = m(m 3 4- 6m a + 7m + 1), 

and deduce that 

/^ 3 = m, /^ 4 3m 2 + m. 

Deduce these results also as limiting values of the corresponding 
moments in Ex. 1. 

3. In a normal distribution, whose mean is 2 and S.D. 3, find a 
value of the variate such that the probability of the interval from 
the mean to that value is 0-4115. (Ans. x = 6-05.) Find another 
value such that the probability for the interval from x = 3-5 to 
that value is 0-2307. (Ans. x = 6*26.) 

4. // two independent variates, x and y, have Poissonian distribu- 
tions with means m l and w 2 , their sum is a Poissonian variate with 
mean m 1 4-m 2 . (Cf. 18.) 

The variates take only the values 0, 1, 2, 3, .... We require the 
probability that their sum will take the value r. The probability 
that simultaneously x will have the value s and y the value r s is 

wj exp ( m t ) m r z~ 8 exp ( m 2 ) 

~ ~- 



Summing for all values of s from to r, we see that the probability 
that x 4- y will have the value r is 

^~ 8 (m, 4- M 9 ) r exp ( m* m 9 ) 
AT. = 2 - , - --- ~ - -. 

s\ rl 



Consequently x + y is a Poissonian variate with mean m 1 4-m 2 . It 
follows that, if the independent variates x t have Poissonian distribu- 
tions with means m i (i = 1, ..., n), their sum is a Poissonian variate 
with mean 2 m i- 

5. Consider the normal distribution which has the same mean 
(3-95 Ib.) and the same S.D. (0-46 Ib.) as the distribution of yields 
of grain in Ex. I, 2. Find the relative frequencies of the normal 
distribution for the intervals corresponding to the various classes, 
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and deduce the class frequencies per 500 of the normal distribution. 
This process is sometimes spoken of as fitting a normal distribution 
to the data. 

Take, for example, the class whose limits are 4-1 and 4-3 Ib. 
Since the mean is 3-95 we have, for the lower limit, x/or = 0-15/0-46 
= 0-3261. The area from the mean to this limit is therefore 0-1278, 
and the area to the right of it is 0*3722. Similarly, for the upper 
limit x/o- = 0-35/0-46 = 0-7609; and the area to the right of this is 
0-2233. The difference of the areas to the right of the limits is that 
corresponding to the interval. Its value is 0-1489. This is the relative 
frequency for that interval. 

The work can be tabulated as below. Here x denotes the deviation 
of a class limit from the mean 3-95 Ib. Knowing that a = 0-46 Ib. 
we can find x/cr for each limit. The third column gives the area under 
the normal curve, to the right of the class limit. The differences, 
recorded in the next column, are the relative frequencies for the 
various classes. These differences, multiplied by 500, give the normal 
frequencies of the classes per 500 of the total. The last column 
records the class frequencies in the given distribution. The lower 
limit of the lowest class is taken as oo, and the upper limit of the 
highest class as oo, so as to include the whole of the normal distribution . 

Fitting a normal distribution to that of 600 yields of grain 



Class 
limit 


X/(T 


Area 
to right 


Difference 
d 


600* 


Observed 


00 

2-9 


CO 

-2-2826 


1-0000 
0-9888 


0-0112 
0-0212 


5-6 

10-6 


4 
15 


3-1 


-1-8479 


0-9676 


0-0464 


23-2 


20 


3-3 
35 


-1-4130 
-0-9783 


0-9212 
0-8361 


0-0851 
0-1295 


42-6 
64-7 


47 
63 


3-7 
3-9 


-0-5435 
-0-IOS7 


0-7066 
0-5433 


0-1633 
0-1712 


81-6 
85-6 


78 
88 


4-1 


0-3261 


0-3722 


0-1489 


74-4 


69 


4-3 


0-7609 


0-2233 


0-1073 


63-7 


59 


4-5 
4-7 
4-9 


1-1956 
1-6304 
2-0652 


0-1160 
0-0516 
0-0194 


0-0644 
0-0322 
0-0132 


32-2 
16-1 
6-6 


35 
10 
g 


5-1 


2-6000 


0-0062 


0-0062 


3-1 


4 


oo 


00 


0-0000 









6. As in the previous example, fit a normal distribution to the 
distribution of wages of 1,000 employees given in Ex. I, 3, showing 
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that the class frequencies per thousand of the normal distribution 
are approximately 6-7, 11-3, 25-0, 48-0, 79-0, 113-1, 140-5, 151-0, 
140-8, 113-5, 79-5, 48-1, 25-3, 11-5 and 6-7. 

7. In 1,000 extensive sets of trials for an event of small probability, 
the frequencies /j of the numbers x i of successes proved to be 

x: 1 234567 

/: 305 365 210 80 28 9 2 1 

Show that the mean number of successes is 1-2, and hence that the 
frequencies of the Poissonian distribution, with the same mean and 
the same total frequency, are approximately 301-2, 361-4, 216-8, 
86-7, 26-0, 6-2, 1-2, 0-2 ____ Verify that the variance of the given 
distribution is 1-28. 

8. Sum of independent normal varieties. Another proof of the 
theorem of 22 may be given as follows. Let x and y be the in- 
dependent normal variates, with a common mean zero and variances 
<r\ and o\ respectively; and let u = x -f y. Then, for a fixed value of y, 
du = dx. The probability that the value ofx will fall in the interval 
dx is <fi(x) dx\ and therefore, for a fixed value of y in the interval dy, 
the probability that u will fall in the interval du is 



But the probability of y falling in the interval dy is 



and the compound probability of u falling in the interval du and, 
at the same time, y in the interval dy is the product dp^p^ In- 
tegrating with respect to y over its range of variation, we have the 
probability that u will fall in the interval du, irrespective of the 
value of y, as 

. du f* T u-)* i/ 2 
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In this expression the argument of the exponential may be put in 
the form 



If then we change the variable of integration from y to t, where 



we have 

M r cxp| 



r a i j % r (o-j+crijin 

- o7 o ;r\ CX P s~T"V~ 

|_ 2 (o r ! + o r 2)JJ-"o L M^a J 



in virtue of (10). Thus u is normally distributed about zero as 
mean, with variance erf -f- o"|. The reader should have no difficulty 
in removing the restriction of a common zero mean for the 
variates. 

The following is still another proof of the theorem. With the 
same notation let 

&9 CT-, 



Then 



and, as the values of and y range from oo to oo, so do those of 
u and v. The probability that x and y fall simultaneously in the 
respective intervals dx and dy is 



^ - ex P o> 
2770-^2 L 2\crj 

so that the probability density for the joint distribution of a; and y. 
i.e. the probability per unit area at the point (#,y), is 
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Now the area of the element of the xy plane, bounded by the curves 
along which u has the values u and u -f du respectively, and those 
along which v has the values v and v -f dv, is 

d(x,y) 



d(u,v) 



dudv = * 2 dudv, 

CTJ + CTJJ 



where ^ ~4 is the Jacobian of x and y with respect to u and v . 

C( U, V) 

Consequently the probability that, for a random choice of x and y, 
the representative point (#, y) will fall in this element of area is 

dudv 



exp 

1 



du I u* dv 



where cr 2 = o"\ -f- o*|. Since this is the probability that simultaneously 
u will lie in the interval du and v in the interval dv, it follows that 
these variates are independent, and that each is normally distributed 
with zero as mean and v\ + cr\ as variance. 

9. Using the formulae of Ex. I, 11 show that, for the binomial 
distribution, the factorial moments about the mean are 

/% = npq t /% - - 2npq(p + 1), 
and that, for the Poissonian distribution, 
) = ra, ft (3) = - 2m, /e (4) 



MATHEMATICAL NOTES 
NOTE 1 

Derivation of the Poissonian distribution (see 18) 

In the binomial distribution the probability of the value r of 

ln\ 
the variate is I lp r # n ~~ r . We require the limiting value of this ex- 

pression when p = m/n and n tends to infinity, m being constant. 
The expression may be written 

n\ 






_ 
nj r\\ n (nr)\n r (lm/n) r * 
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The limiting value of that part which precedes the sign of multi- 
plication is m r er m lr\. Then, using Stirling's formula for nl, viz. 



(the limiting value of the ratio of these two expressions, as n tends 
to infinity, being unity), we have 

n\ 
Km 



(n-r)ln r (l-m/n) r 
= lim-; 



7r(n - r)} (n - r) n ~ r e~< n ^n r ( 1 - m/n) r 



e r ( 1 - r/n) w ~ r+i ( 1 - m/n) r e r e~ 
since r is finite. Consequently 



which is the required probability of the value = r in the Poissonian 
distribution with mean m. 

NOTE II 

Derivation of the normal distribution (see 19) 
To examine the behaviour of the probability 



P = 



(up H- x) \ (nq a:) ! 



as n tends to infinity, replace the factorials by their values given 
by Stirling's formula (Note I). Then an easy algebraical simpli- 
fication leads to 



where 

np 



/ r \na-artj 

(1- I 

\ nqf 
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Taking logarithms of both sides we may write, provided | x \ is less 
than the smaller of the two quantities up and nq, 



x 



s 3 / 1 1 

6/i 2 \g2~~ 

-f terms with higher powers of 1/n. 
Introducing a new variable, 2; = x/Jn, we may write the above 



I l\ Z 2 Z 2 /I 1\ 

___ 14- ____ I __ I __ I 

p q) + 2pq ln\p^ q*J 



I 1 
- __ 14- 

' 



This series is convergent so long as | z | is less than the smaller 
of the quantities p^ln and q Jn. Now let n tend to infinity. Then, 
provided that neither p nor q is very small, and that | z | is either 
finite or of lower order than %/n, all the terms of the above series tend 
to zero except z*/2pq] so that, when n tends to infinity, we have 



or N = exp (z 2 /2pq). 

Corresponding to unit increment in x we have the increment l/^/n 
in z, which may be denoted by dz when n tends to infinity. And, if 
we write dP for the limiting value of P, the above formula for P 
becomes 

dz 



giving the probability of z falling in the interval dz; and we have 
the required continuous distribution of z. 

NOTE III 
An important integral 

To evaluate the integral 



fo 

Jo 
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f f a 

/!= exp(-fc a )ete= exp(-? 

Jo Jo 

r a r a 

/2 = exp( a; 2 -~ 

Jo Jo 



[Ill 
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let us write 



Then 



If we regard x, y as Cartesian coordinates in a plane, the integration 
is extended over the area of the square OAGB, bounded by x = 0, 
x ~ a, y = 0, y = a (see Fig. 4). In polar coordinates the above 
relation is 

Ji-ffp(- 

JJ 

the integration being extended over the same square Since the 
integrand is positive, the integral is intermediate in value between 
the integrals over the quadrants of circles, with centre and radii 
a and a*j2 respectively. Consequently I\ lies in value between 

flit fa fin /W2 

dd\ rexp(-r 2 )dr, and dO\ rexp(-r 2 )dr. 
Jojo Jojo 



But, as a tends to infinity, each of these integrals converges to %n. 
Hence 



and therefore 



/c 

Jo 



fo Q 

The substitutions = 2/0*^2 then 



and, since the integrand is an even 
function of z, 



x 



FIG. 4 



CHAPTER IV 

BIVARIATE DISTRIBUTIONS. 
REGRESSION AND CORRELATION 

23. Discrete distributions. Moments 

Suppose that we have records for N marriages, giving the ages of 
bridegroom and bride, x and y years respectively. Then to each 
marriage corresponds a pair of values (x i9 y t ) of the variables. Each 
pair may be represented by a point P i in the x, y plane, the co- 
ordinates of the point being the pair of values represented. Such a 
graphical representation is called a scatter diagram. Possibly the 
pairs of values are not all different. If the pair (x t , y t ) occurs f i times, 
then/f is the frequency of that pair; and, as in the case of a single 
variable, 



n being the number of different pairs of values. The assemblage of 
pairs of values, together with their frequencies, constitutes a 
bivariate frequency distribution. Other examples of bivariate dis- 
tributions are furnished by the heights and the weights of a group 
of men, or by the amount of fertilizer per acre and the yield of grain 
per acre on a number of different plots of land. 

The moments of a bivariate distribution are generalizations of 
those of a univariate one. Thus the moment /4 about the origin 
(0, 0), of order r in x and s in y, is defined by 



</?- (2) 

In particular , -, 

f*io = -ft %fi x i = *> ^01 = -fi ZfiVi = y (3) 

where x is the mean value of x in the distribution, and y the mean 
value of y. Similarly, 

/4o = ^ S/<*1 = o* + x\ ^ = ^ S/if = <7j + F> (4) 

5-2 
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where a* is the variance of the variable x in the distribution, or the 
moment // 20 about the mean (x,y)\ and likewise cr* is the variance 
of j/, or the moment / 02 about the mean. Lastly, we may mention 
the moment /4i, which is given by 



The corresponding moment fi n about the mean of the distribution 
is called the covariance* of the variables. Thus 



= P r n-xy (5) 

in virtue of (1) and (3). This formula is very important, and will be 
used frequently. 

24. Continuous distributions 

A continuous bivariate distribution includes all pairs of values 
represented by points within a certain region of the xy plane, each 
pair of values occurring at least once. The number of pairs of values 
is therefore infinite. Distributions of the type ordinarily employed 
have each for relative frequency density a continuous function f(x, y), 
which is such that the relative frequency for the infinitesimal 
rectangular region, of area dxdy, and defined by 



has the value f(x,y)dxdy. Thus /(#,?/) is the relative frequency per 
unit area at the point (x, y), and the sum of the relative frequencies 
for all elements of the distribution is unity, so that 

f(x,y)dxdy= 1, (6) 

* Many writers denote this quantity by p. We are, however, loth to over- 
work the symbol p by giving it still another meaning. 
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the integration extending over the region* of the plane which repre- 
sents the distribution. The mean (x, y) of the distribution is given by 

x = I j x f( x > y) dxdy, y=\\ Vf( x > y) dxdy. (7) 

For higher moments we have formulae corresponding to those of 
a discrete distribution. Thus 

/%) = \\( x - x Yf( x > y} dxdy 

= /4o- 2 ^ \xf(x,y)dxdy + x 2 = /4 -#* 

which is equivalent to 

<>i = /4o-* 2 > (8) 

and similarly, or^ = /^ y 2 . (9) 

The covariance, being the first product moment about the mean, is 

/*n = f j ( x - *) (y - y)f( x > y) dxd y 

= /^ii - x \\ yf( x > y) dxd y - y J J x f( x > y) dxd y + x y 



in virtue of (6) and (7). 

It will be sufficient, as a rule, to give proofs of theorems for a 
discrete distribution. The student can easily rewrite these for the 
case of a continuous distribution, replacing sums by definite 
integrals, and the relative frequency fJN by f(x,y)dxdy. Some of 
the steps will be indicated in Examples. 

25. Lines of regression 

It frequently happens that the scatter diagram indicates an 
association between the variables, x and y, the distribution of dots 
being denser in the neighbourhood of a certain curve, which may 
be called a curve of regression. The equation of such a curve indicates 
a functional relationship to which the association of the variables 

* By defining/(a;, y) as equal to zero outside this region, we may make the 
range of integration oo to oo for each variable. 
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approximates, more or less roughly. We shall consider later the 
problem of fitting different curves to the data, so as to obtain the 
curve of a specified form which best fits the data. For the present 
we confine our attention to the straight line which is the best fit, or 
more definitely, to the problem of determining two straight lines, 
one of which gives the closest estimate a straight line can give to 
the average value of y for each specified value of x, while the other 
gives the corresponding estimate of x for a given value of y. 
These are called the lines of regression of y on x, and of x on y 
respectively. 

Consider first the line of regression of y on x. We hy ve to deter- 
mine constants a and b so that the equation 

y a + bx (11) 

gives, for each value of x, the best estimate a linear equation can 
give for the average value ofy. We interpret the term ' best estimate ' 
in accordance with the principle of least squares; that is to say, we 
find a and b so as to minimize the sum of the squares of the deviations 
of the actual values of y in the distribution from their estimates 
given by (1 1). Thus if P^ is the point of the diagram representing the 
pair of values (x^y^, and H t is the point on the straight line (11) 




FIG. 5 



with the same abscissa x i , the deviation of P i from its estimate is 
11^, whose value is clearly y i (a + bx i ). Thus we have to choose 
a and b so as to make 2/t(yi~ a *~^i) 2 a minimum, f i being the 

i 

frequency of the pair of values (x^y^. For a minimum value the 
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partial derivatives of the above expression with respect to a and b 
must both be zero. Hence the equations for determining these 
constants are 

S/f(y--&*f) = 0> S/iSi(^--tei) = 0, (12) 

which are called the normal equations. In virtue of (3), (4) and (5) 

they are equivalent to 

y-a-bx = Q (13) 

and p n + xy--ax-b(<r*+x 2 ) = o. (14) 



The first of these shows that the required lino passes through the 
mean (x,y) of the distribution. Also, on eliminating a between 
(13) and (14), we find 



Consequently the gradient b of the line of regression of y on r is 
given by 



and, since the line passes through (x, y), its equation may be expressed 



If the mean of the distribution is taken as origin, the line of regres- 
sion of y on x is 



The gradient / n /cr| is often called the coefficient of regression of 
y on x. 

Similarly, or by interchanging variables, we find that the line of 
regression of x on y is 



and /in/ffy is the coefficient of regression of x on y. The product of 
the two coefficients of regression is symmetrical with respect to x 
and y. Its square root, /^n/^o*^, is the coefficient of correlation, r. 
Thus 

r= ^cT' (19) 

having the same sign as the covarianee /%. 
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Example 1. Find the tangent of the inclination, p, of (18) to (16). 
Since the gradients of the two lines are crj//^ u and /* u /0i, we have 



Example 2. For a continuous bivariate distribution the mean square 
deviation from the line of regression of y on x is 



(y-a- bx) z f(x, y) dxdy, 
By minimizing this for variation of a and b, show that the normal equations are 

(y-a-bx)f(x,y)dxdy = = x(y-a-bx)f(x,y)dxdy, 
and that these are expressible in the forms (13) and (14). 

26. Coefficient of correlation. Standard error of estimate 

The significanceof the correlation coefficient, r, as a measure of 
the closeness of the association of the variables x and ?/, will be 
apparent from the theorem now to be considered. The equation of 
the line of regression of y on x was found by minimizing the sum of 
the squares of the deviations H^. We shall now prove that the 
sum of the squares of these deviations from the line of regression of 
y on x is equal to No"^(\ r 2 ). 

Let the mean of the distribution be taken as origin, so that 
x y 0. Then, in virtue of (17), H i P i is equal to y i bx^ and the 
required sum of squares of the deviations is 



t*), (20) 

as stated above. Denoting this sum of squares by NS* we have 

<8=o*(l-r) (21) 

or ^^i-S- ( 22 ) 
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Since SJ is the mean square deviation of points from the line of 
regression of y on x, S y is called the standard error of estimate of y 
from the regression equation (16). In the same way the sum of the 
squares of the deviations of points from the line of regression of x 
on y, measured parallel to the #-axis, is NS%, where 

fl=o*(l-*), (23) 

and S x is the standard error of estimate of x from the regression 
equation (18). 

Since the sum of squares of deviation cannot be negative, it follows 

from (20) that r 2 ^!, or 

1 } ' -Ur^l. (24) 

If r = 1 or 1, the sum of squares of deviations from either line of 
regression is zero. Consequently each deviation is zero, and all the 
points lie on both lines of regression. These two lines then coincide, 
and there is a linear functional relation between the variables x 
and y, giving perfect correlation. The nearer r 2 is to unity, the closer 
are the points to the lines of regression, and the nearer are these two 
lines to coincidence (cf. 25, Ex. 1). Thus the magnitude of r may be 
taken as a measure of the degree to which the association between the 
variables approaches a linear functional relationship. The sign of r 
is the same as that of the co variance p ll9 and therefore also the same 
as that of the gradients of the lines of regression. Hence r is positive 
when, on the whole, y increases with x, and negative when y decreases 
as x increases. When r is zero the variables are usually described as 
uncorrelated. 

The coefficient of correlation between two variables is also the 
coefficient of correlation between the deviations of the variables 
from their means. For the value of r depends only on /a u , v x and o" y , 
and these are functions of the above deviations. Thus 



r = 
and, in virtue of (3), (4) and (5), this may be expressed 

(25) 



This formula is sometimes convenient. 
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Example. Taking the mean as origin for a continuous distribution, show 
that the mean square deviation, 6y from the line of regression of y on x 
is given by 



J J (y - bx)*f(x, y) dxdy 



27. Estimates from the regression equation 

A few simple relations between the y i and their estimates, Y it 
given by the regression equation (11) or (16), play an important 

part in later work. Thus 

(26) 



and the normal equations (12) are expressible as 

S/,(y<-r,)~o (27) 

and Z/;^-^) = 0. (28) 

From (27) it follows that the mean of the Y's is equal to the mean 
y of the t/'s. Also, on multiplying (27) and (28) by a and b respectively 
and adding, we deduce 

2f t Y t (y t -Y { ) = 0, (29) 

and therefore, in virtue of (27), 

y) = o. (os) 



This relation is very important. 

The sum of the squares of the deviations of the ?/ 7 s from their 
mean may then be expressed 



since the sum of the products vanishes by (30). Now the first 
member of (31) has the value Na%. The first sum in the second 
member has been proved equal to AVj(l r 2 ). Consequently 



(32) 

showing that the variance of the Y's is r 2 times that of the y's; or 

(33) 
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Finally, wo may show that the coefficient of correlation between 
the y i and their estimates lf i is equal to | r \ . Take the origin at the 
common mean of these variables, so that y = 0. Then, in virtue 
of (29), 



and the coefficient of correlation between the y's and the Y 's is 



(r y cr r 



__ 

_ / 



Example 1. Prove (34) as follows. Taking the mean of the distribution as 
origin, so that Y { bx i9 (Ty = | b \ cr x , we have for the required correlation 
coefficient 

~No-'^ = W\b\vjr~ y = \b\ = ' T '' 
since r and 6 have the same sign. 

Example 2. Show that the normal equations for a continuous distribution 
(cf. 25, Ex. 2) are expressible as 

J J (y - Y)f(x, y) dxdy = 0, J J x(y - Y)f(x, y) dxdy = 0, 
and from those deduce the relations 

; 0, \\(y-Y)(Y-y)f(x,y)dxdy = 
corresponding to (29) and (30). Hence show that 

(y - Y)*f(x, y)dxdy+\ \(Y- y) 2 f(x, y) dxdy, 
and deduce that ( T - y) z f(x, y) dxdy = r 2 crj. 

28. Change of units 

Before illustrating the above theory by a numerical example, let 
us examine the effect of a change of units on the calculation of r, 
/i u and b. As in 2, let u be the measure of the deviation of # from an 
origin x = a, in terms of a unit c times the original #-unit, so that 
x = a + cu. Then, if #' and u r are the deviations of the two variables 
from their means, x' = cu', and 
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Similarly, if v is the measure of the deviation of y from an origin 
y = a', in terms of a unit c' times the original y-unit, y = a' + c'v , 
T/' = cV and <r v = c'cr v . Then the coefficient of correlation between 
x and y is 

_ w'!Lifi u 'i v ilN __ covariance of ^ and y 



_ 



and is thus equal to the correlation between u and v. The value of r 
is therefore independent of the units employed, and is thus an 
absolute measure of correlation. But the covariance of x and y is 



= cc' (covariance of u and v). (35) 

Similarly, the regression coefficient of y on a: is 

6 = Ol = _ (regression coefficient of v on M). (36) 

<TX c 

If then c and c' are equal, the value 6 is the same for both pairs of 
variables. 

29. Numerical illustration 

For 1,000 marriages the ages of bridegroom and bride, x and y years 
respectively, are grouped in the table below with class interval of 5 years for 
each, the frequencies for the different classes being shown in the body of the 
table. Find the regression equations and the coefficient of correlation between 
the variables. 

Such a table is called a correlation table. The values of x and y indicated 
are the mid- values in the classes. Thus for the class in which the age of the 
bridegroom is between 25 and 30, and tho age of the bride between 20 and 25, 
the values of x and y are taken as 27-5 and 22-5 respectively, and the frequency 
is 190. The data represented in any one column constitute a vertical array, 
or an array of y's, because in each such array y assumes different values 
while x remains constant. Similarly, the data in any row constitute a hori- 
zontal array, or an array of a?'s. In the row prefixed N the frequencies 
are given for tho individual columns, and in the column headed N r the 
frequencies for the separate rows. 

Let us take as new origin the point (27-5, 27-5), whose coordinates are 
the mid-values of the class 25 to 30 for x and y; and, as a new unit 
for each variable, the common class interval of 5 years. Then the deviations, 
u and v, of the ages of bridegroom and bride from tho new origin in terms of 
the new unit are those shown in the second row and the second column. The 
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fourth row from the bottom of the table, prefixed uN c , gives the sum S/w 
for each column; and these are added horizontally to give the total sum 124 
for the distribution. The next row gives the sum 2/u 2 for each column ,with 
total sum 1,834 for the distribution. The row prefixed V gives for each 
column the sum S/o. Thus the sum 28, in the column u = 2, is obtained 
from 



The last row gives the sum 2/wv for each column, each entry being obtained 
by multiplying the value of V above by the common vuluo of u for that 
column. Summing horizontally all the values uV, we have the product sum 
sLfuv for the wholo distribution, viz. 1,109. 

The columns to the right of the table are explained similarly. That headed 
U gives for each row the sum H/w. Thus the first entry 79 in the column is 
obtained from 

1 1( - 2) + 62( -1)4-19x04-3x1 + 1x2 = - 79. 

The last column, headed vU t gives the sum ]/wy for each row, any entry 
being obtained by multiplying the value of U to the left by the common value 
of v for that row. Summing the entries in this column we find again the 
product sum ^fuv for the whole distribution, thus providing a chock on the 
calculations. 

Using the values thus obtained we have 

u = 124/1000 = 0-124, v = - 371/1000 = - 0-371. 
Therefore 

x = 27-5 + 5(0-124) = 28-120, y = 27-5-5(0-371) = 25-645, 
giving the mean ages of bridegroom and bride. The variance of u is 

crj= 1-834- (0-124) 2 = 1-8186, 
so that cr u = 1-349 = 1-35 nearly, 

and therefore cr x = 6-745 = 6-75 nearly. 

Similarly, we find <% = 1-531 -(0-371) 2 = 1-3934, 

<r,= M8; cr v = 5-90. 

The coefficient of correlation between x and y is equal to that between u 
and v. Thus 

__ Zuv/N-uv __ 1-109 + 0-371 x 0-124 

~ cr tt cr~ ~ 1-349 x 1-180 

= 0-726 = 0-73 nearly. 

Since the units for u and v are equal, the regression coefficient of y on x is 
equal to that of v on u, which is 

(covariance of u and v)/crj = 1*155/1*8186 = 0-635. 
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The line of regression of y on x is therefore 

y- 25-645 = 0-635(a- 28-12), 
that is y = 7-79 + 0-635a. 

The student may find similarly that the line of regression of x on y is 

-8292/. 



30. Correlation of ranks 

A group of n individuals may be arranged in order of merit or 
proficiency in the possession of a certain characteristic. The same 
group would, as a rule, give different orders for different character- 
istics. Considering the orders corresponding to two characteristics, 
A and B, let x { , y i be the ranks of the ith individual in A and E 
respectively. Then the coefficient of correlation between the X'B and 
the f/'s is called the rank correlation coefficient in the characteristics 
A and B for that group of individuals. On the assumption that no 
two individuals are bracketed equal in either classification, each of 
the variables takes the values 1, 2, 3, ..., n; and therefore 

x=i(n+\) = y. (37) 

As a rule x i is not equal to y^ Let d i denote the difference, so that 

d^xt-yi. (38) 

Then, if x' and y' denote the deviations of the variables from their 
means, we have also 

dt^xi-y't. (39) 

The coefficient of correlation between the variables is given by 
r - &M (40) 

aa ( ' 



To express this in terms of n and the differences, d t , we observe that 
the variance of each of the variables x and y is (n 2 1)/12 (cf. 3, 
Ex. 2). Therefore 

(41) 



Also 

and thus, in virtue of (41), 



80 Bivariate Distributions [iv 

Substitution of these values in (40) gives 

(42) 

This is the required formula for the coefficient of correlation of 
ranks. 

If the correlation is perfect all the d's are zero, and r = 1. If the 
orders in the two characteristics are exactly the reverse of each 
other, Xf + yi n + 1. All the points of the scatter diagram then lie 
on a straight line with negative gradient. Consequently r = 1, 
and there is perfect inverse correlation. 

Example. The ranks of the same 16 students in Mathematics and Physics 
were as follows, two numbers within brackets denoting the ranks of the same 
student in Mathematics and Physics respectively: (1, 1), (2, 10), (3, 3), (4,4), 
(5,5), (6,7), (7,2), (8,6), (9,8), (10,11), (11,15), (12,9), (13,14), (14,12), 
(15, 16), (16, 13). Calculate the rank correlation coefficient for proficiencies 
of this group in Mathematics and Physics. 

Here d, is the difference between the two numbers in the ith pair of brackets. 
It is easily verified that 2flT< = 136, n 3 n = 16 x 255. Consequently 

1 __. = i __ ~ .3. 

31. Bivariate probability distributions 

The theorems proved for a bivariate frequency distribution in 
23-28 hold equally for a probability distribution of two variates, 
relative frequency in the former case being replaced by probability 
in the latter. Thus, for a discrete distribution, if Pi is the prob- 
ability of the occurrence of the pair of values (x^y^, the moment 
[i rs about the origin is 



which is the expected value of the product x r y*. In particular the 
expected values of x and y are 



E(x) = X Pi x i9 E(y] = 
while, corresponding to (4) and (5), we have 

o* = /4o - 
and P 
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/i u being the covariance of the variates, which is the expected value 
of the product of their deviations from their means. The coefficient 
of correlation is defined by (19), which may here be expressed in the 
alternative form 



r 



x', y' being the deviations of the variates from their expected values. 

In the case of continuous probability distributions we confine 
our attention to those in which the probability that the variates 
will fall simultaneously in the intervals dx and dy is expressible in 
the form <f>(x,y)dxdy, the probability density <p(x,y) being con- 
tinuous and essentially positive. Then we have formulae corre- 
sponding to (6)-(10), with (f>(x,y) in place off(x,y). 

The variates are independent if the probability distribution of 
each is independent of the value assumed by the other. In particular, 
if the variates are continuous, with probability densities <f>i(x) and 
<f> z (y) respectively, the probability that x will fall in the interval dx, 
and at the same time y will fall in the interval dy, is <f>i(x) dx(j) 2 (y)dy 
by the theorem of compound probability. Then the probability 
density <f>(x y y) for the bivariate distribution is of the form <t>i(x) <^ 2 (2/)- 
Conversely, when this relation holds, the continuous variates are 
independent. Now it was proved in 10 and 12 that the covariance 
of two independent variates is equal to zero. It follows from the 
above definition of r that the coefficient of correlation of two independent 
variates is equal to zero. The converse of this theorem, however, is 
not necessarily true; that is to say, uncorrelated variables are not 
necessarily independent. 

The moment generating function of the bivariate distribution is 
defined as the expected value of the function exp^E-Mgt/), where 
t and 2 are independent of x and y. Thus, in the case of a con- 
tinuous distribution, the m.g.f. with respect to the origin is 



-^(Ma) = exp(t 1 



When this integral has a meaning the exponential may be expanded 
in powers of ^ and t z ; and the coefficient of t^/rlsl is the moment 
th about the origin. In Ex. 3 at the end of Chapter v we shall find 
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this function for the bivariate normal distribution, and use it to 
calculate the moments. 

When the variates x and y are independent we have 



= (function of tj x (function of t 2 ). 

And conversely, when the m.g.f. is of this form, the variates are 
independent. 

32. Variance of a sum of variates 

Consider the variates x and ?/, with standard deviations cr x and 
(Ty respectively. Their distributions determine a bivariate prob- 
ability distribution for the two variates, with correlation coefficient 
r given by (19'); and each value that may be taken by their sum 



(43) 
has a definite probability. Then, in virtue of 10 (10), 

(44) 



and therefore, if #', y' ', u r are the deviations of the variates from 
their means, we obtain from (43) and (44) by subtraction 

u' = x f + y'. 
Consequently u' 2 = x' 2 + y'~ + 2x'y', 

and, on taking the expected value of each member we deduce, in 

virtue of (19'), 

0i-cr| + <73 + 2rovV (45) 

This is the required formula for the variance of the sum of the 
variates. And, since r is the correlation between x and y, it 
follows that the variance of the difference 

v = x y 
is given by of, = o-| + a 2 - Zrcr x (Ty. (46) 

If r is zero, as in the case of independence of x and y, we have the 

simple result 

<r2 = c7?, = <7j + <rj, (47) 

already proved in 10, 1 5 and 16. 
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More generally, let u be a linear function of the variates x,y,z, . . . , 

so that , , , , , v 

u = az + &2/ + cz+..., (48) 



the constants a, 6, c, ... being either positive or negative. Then 

E(u) = aE(x) + bE(y) + cE(z) + ... 
and therefore by subtraction 

u' = ax' + by' -f cz r + . . . , 

the primes indicating that the variates are measured from their 
means. On squaring both sides, and taking expected values, we 
obtain the required formula 



c*al+... + 2abr xy <r x (r y +..., (49) 

in which r xy is the coefficient of correlation between x arid y. 
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EXAMPLES IV 

1. Show that, if a and 6 are constants and r is the correlation 
between x and y, then the correlation between ax and by is equal to 
r if the signs of a and 6 are alike, and to r if they are different. 

Also show that, if the constants a, 6, c are positive, the correlation 
between (ax + by) and cy is equal to 

(ar<r x + bcr y )IJ(a*o* + 6V* + 2a6nr x cr y ). 

6-2 
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2. The variables x and y are connected by the equation 
ax 4- by -h c = 0. Show that the correlation between them is 1 if 
the signs of a and 6 are alike, and -f 1 if they are different. 

3. Show that, if x', y' are the deviations of the variables from 
their means, 



r = - 1 + gft S (x\\a x -f yj/cr y ) 3 , 



and deduce that - 1 < r ^ 1. (Rietz, 1927, 2, p. 84.) 

4. The variates x and i/ have zero means, the same variance cr 2 
and zero correlation. Show that 

(x cos a -fy sin a) and (x sin a y cos a) 
have the same variance cr 2 and zero correlation. 

5. Weighted mean with minimum variance. Let x i (i 1, ...,n) 
be ?i independent variates with variances erf. If the variates are 
given weights w i9 their weighted mean is 



where c, = t^/S **V 

i 

We can show that the variance of this weighted mean is least when 
the weights are inversely proportional to variances o*f. In virtue 
of (49) the variance of the weighted mean is 



For a minimum cr* the partial derivatives of this with respect to the 
w i must be zero. This requires 



so that w^l = (Swf crJ)/(5X), 

showing that ^cr| is the same for all values of i. Thus the weights 
of the variates are inversely proportional to their variances. 

Show that this minimum variance is equal to H/n, where H is 
the harmonic mean of the variances erf. 
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6. For a given bivariate distribution find the straight line for 
which the sum of the squares of the normal deviations is a minimum. 

Let the straight line be x cos a -ft/ sin a = p. Then we have to 
minimize the sum of squares ^f i (x i cos a -f y i sin a,p) 2 by equating 
to zero its partial derivatives with respect to p and a. Show, from 
the first of the equations thus obtained, that the required line passes 
through the mean of the distribution. Then, taking the mean as 
origin, show from the second equation that a is given by 



tan 2a = 

rf" rj- 

V X (Jy 

Of the two directions at right angles, found from this equation, one 
makes the sum of squares a minimum, and the other a maximum, 
for lines through the mean of the distribution. (L. J. Reed, Metron, 
vol. i, 1921, part 3, pp. 54r-61.) 

7. The ranks of the same 15 students in Mathematics and Latin 
were as follows, the two numbers within brackets denoting the ranks 
of the same student: (1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), 
(8,1), (9,11), (10,15), (11,9), (12,5), (13,14), (14,12), (15,13). 
Show that the rank correlation coefficient is 0-51. 

8. The marks, x and y, gained by 1,000 students for theory and 
laboratory work respectively, are grouped with common class 
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42 


47 


52 

19 
37 

74 
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25 
45 
96 
54 
18 
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77 


82 
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52 
57 
62 
67 

72 
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82 
87 
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26 
38 
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19 
54 
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31 
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23 
43 
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13 

2 


7 
9 
19 
15 
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8 
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3 
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35 
103 
192 
263 
214 
130 
46 
17 


Totals 


26 


97 


226 


244 


189 


137 


55 


21 


5 


1,000 



interval of 5 marks for each variable, the frequencies for the various 
classes being shown in the correlation table above. The values of 
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x and y indicated are the mid-values of the classes. Show that the 
coefficient of correlation is 0-08, and the regression equation of 
y on x 

y = 29-7 + 0-656z. 

9. The logarithm of the m.g.f., M (t v Z 2 ) , of 31 is the cumulative 
function K(t v 2 ); and the cumulant K rs is the coefficient of t\t^\r I si 
in the expansion of ^(^i, 2 ) in powers of ^ and 2 . Show that 



^22 = /^22 ""/^ 20/^02 "" 

10. By means of the identity 



and the fact that u v is constant along each of one set of the 
diagonal lines of a correlation table, the sum of products may 
be calculated without difficulty. Tabulate the quantities u v, 
(u v ) 2 , / and f(u v) 2 for each diagonal line of the correlation 
table on p. 77, and deduce that ^f(u v)*= 1147. Using the 
values found for 2 /^ 2 an( i 2 fv 2 verify that 2 fav = 1109. 
Deduce the same result from the identity 



using the other set of diagonal lines. 



CHAPTER V 

FURTHER CORRELATION THEORY. 
CURVED REGRESSION LINES 

33. Arrays. Linear regression 

In the numerical illustration of 29 all the values of x in any one 
vertical array were regarded as equal to the mid-value of x for that 
interval. Since the actual values of x vary over a range of 5 years, 
the results obtained cannot be regarded as more than a good approxi- 
mation. A better approximation could be obtained by choosing a 
smaller class interval; but that would make the numerical work 
correspondingly heavier. For accurate work the class interval must 
be so small that all the values of x in a vertical array are either 
exactly or very nearly equal; and similarly all the y's in a horizontal 
array. The theoretical proof of certain formulae, however, is not 
made any more difficult by a choice of arrays sufficiently numerous 
to satisfy the above conditions; for our sums are just as easy to 
manage whether the number of arrays is large or small. And, if 
we are dealing with a continuous distribution, we may take in- 
finitesimal class intervals, dx and dy, and replace our sums by 
definite integrals. We assume then that, in the ith vertical array, all 
the cc's have the same value, x^. 

It will be convenient to extend our subscript notation as follows. 
Though the #'s in the ith vertical array all have the same value, the 
j/'s are different. A typical pair of values in the array is (x i9 y^), 
represented by the point P Lj \ and we shall denote its frequency by/^. 
Thus the first subscript, i, indicates the vertical array, while We 
second, j, indicates the position in that array. Denoting the total 
frequency in the ith array by n^ we have 

% = s/, - (i) 

V indicating summation within that array. The mean value of y 

i 

in this array will be denoted by y if Thus 

(2) 
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The mean of the array is represented by the point GL whose co- 
ordinates are (x i9 y^. jH^ is the point with abscissa x i on the line of 
regression of y oiTz. The mean, (?, of the distribution also lies on 
this line of regression, and has coordinates (x, y) given by 

y = 2 w<fc. (3) 



i i 



FIG. 6 

The summation over the whole distribution is thus divided into 

two summations; for denotes summation for terms in the same 

i 
vertical array, and then indicates summation of the result over 

i 

all the arrays. 

It is worth noting that the same equation is obtained for the line 
of regression of y on x, if each value of y in any array is replaced by 
the mean value of y in that array. For the equation of this regression 
line depends only on x, y, &1 and /^ n . And, since the above change 
does not affect the x's, x and v\ are unaltered by it. So also is y, 
in virtue of the second equation (3). As for /j n we see that 



i = S lift, 



= 2 n t x t y i - Nxy. 



by (3), and is therefore unaltered by the above change of ?/'s. 
Consequently, the regression line of y on # is unaltered. It follows 
that, if the means of the vertical arrays are collinear, their line must 
coincide with the line of regression of y on x. This regression is then 
said to be linear. Similarly, the regression of x on y is said to be 



34] Correlation Ratios 89 

linear if the means of the horizontal arrays all lie on the line of 
regression of x on y. It is possible for either regression to be linear, 
or both. 

It was shown in 26 that S 2 = cr%(l r 2 ) gives the mean square 
deviation of points from the line of regression of y on x. If this 
regression is linear the means of the vertical arrays lie on this line 
of regression; and, if further the vertical arrays all have the same 
variance, this common variance must be S 2 . The S.D. of each array 
is then (r y <J(lr 2 ). When the vertical arrays all have the same 
variance, the regression of y on x is said to be homoscedastic^ 

34. Correlation ratios 

Consider next the deviation O i P i j of the point P^ from the mean 
of the vertical array in which it lies. The sum of the squares of 
these deviations for the whole distribution is denoted by NS'y, 
so that 

N8* = xxMu-yi)*> (*) 

Then, by analogy with 26 (2$), the correlation ratio,* ij v , of y on x 
is defined by 

7,1=1-8^0-1, (5) 

r/ y being regarded as positive. Thus 



which corresponds to 26(21). From (5) it is clear that lyj^l. 
Also it is easy to show that ?j 2 ^ r 2 . This is evident from a comparison 
of (5) with 26(22), since S' 2 ^ S 2 , the sum of the squares of the 
deviations in any array being least when they are measured from 
the mean of the array. Thus 

l>$>r*. (7) 

When the regression of y on x is linear, the straight line of means of 
arrays coincides with the line of regression, and i/ 2 is then equal 
to r 2 . A non-zero value of ^ r 2 is thus associated with a departure 
of the regression from linearity. 

* Some writers denote this * ratio' by TJ VX . 



90 Correlation Theory [v 

It is clear from (6) that the more nearly ?/| approaches unity, the 
smaller is S'*, and therefore the closer are the points to the curve of 
means of the vertical arrays. If 97 J = 1, then S'y = 0, so that all the 
deviations are zero, and all the points lie on the curve of means. 
There is then a functional relation between x and y. We may there- 
fore describe the correlation ratio rj v as a measure of the degree to 
which the association between the variables approaches a functional 
relationship expressible in the form y = F(x), where F(x) is a single- 
valued function of x. This should bo compared with the corre- 
sponding interpretation of r in 26. 

A convenient expression for rj v can be found, involving the 
S.D. o' mu of the means of the vertical arrays, each mean being weighted 
with the frequency n i of that array. This S.D. is given by 



-Zi'tit-y)*' (8) 

i 

Now 



the sum of products being zero, since S/i/^tf ~~ IJi) vanishes for each 

j 
array. The first sum on the right is NS'* 9 in virtue of (4), and 

the second is A 7 of nj/ . Consequently the equation is equivalent to 



Thus the variance of the y's of the distribution is expressible as the 
sum of two parts, of which the first is the variance within the 
arrays, and the second the variance of the weighted means of the 
arrays. Also, comparing (9) with (6), we see that 

fy = <rmvlv> 

and therefore ij y = 0" my /0y (10) 

The correlation ratio r/ y is therefore the ratio of the S.D. of the 
weighted means of the arrays of ?/'s to the S.D. of all the y's of the 
distribution. 
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In the same way, by considering horizontal arrays in each of 
which the value of y is constant, we may define the correlation ratio 
1) X of x on y. For, if 0$ is the mean of the jth horizontal array, with 
abscissa x^ and NS' denotes the sum of the squares of the devia- 
tions of the points, each from the mean of the horizontal array in 
which it lies, ?/| is given by 

S; 2 -<ri(l -?/!), (11) 

and, as before, we have the relations 



and Vx^o-mx/v^ O-'*) 

where cr mx is the S.D. of the means of the horizontal arrays, each 
mean being weighted with the frequency n^ of that array. 

35. Calculation of correlation ratios 

Formula (10), giving the value of T/^, is clearly equivalent to 

j = 5>,(fc-OTK- (14) 

Now the sum in the numerator may be evaluated without cal- 
culating the deviations y t y. For 



But 

Wt = 

j 

where 7J is the sum of the j/'s in the ith vertical array. Hence 



where T is the sum of the y's for the whole distribution. We may 
therefore write 

/T12 HT12 7^2 /Tf2 7^2 

V t? (Ti TA 2 _ v 1 94. v ' 

t i(y< y) ~?^~ ^ + --r^~^' 

and (14) becomes 

(/7T2 
?*- 
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Since the deviation from the mean is independent of the choice of 
origin, while numerator and denominator of (14) are altered in the 
same ratio by change of unit, the value of rfy calculated from (16) 
is unaffected by change of origin and unit. 

Example. Calculate the correlation ratios for the distribution of ages of 
bridegroom and bride given in 29. 

Working in terms of v and the 5 -year unit we have 

T = - 371, N = 1,000, T*/N = 137-64. 
The values of T i are given in the row prefixed F. Hence 
T? (-2S) (-339) (26)* 

f -^ = ~rr + -i32~ + - + ~r = 876 ' 81> 

No 3 , = 1393-4. 

2 876-81-137-64 739-17 
Consequently tf = = = '30S, 



and rj v = 0-729 nearly. 

This is only slightly greater than the value 0-726 found for r. 

The reader should verify, in the same way, that rfc = 0-5504, so that 
ij x = 0-742. The departure of the regressions from linearity is only slight. 

36. Other relations 

We shall now prove that the mean square deviation of the 
weighted means of the vertical arrays from the line of regression 
of y on x is equal to O^(YJ^ r z^ that is to say 

r), (17) 



where Y t is the estimate of y i from the regression equation 25 (16). 
For, subtraction of (6) from 26 (21) shows that 



- y t ) + (St 



the sum of products vanishing, since S/0(,Vij""^) ^ s zero f r each 
array. Thus (17) has been established. The difference yfa r 2 is zero 
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only when each of the deviations yi Yi is zero. In that case the 
means of arrays all lie on the line of regression, and the regression 
of y on x is linear. We thus see again that a non-zero value of 7/ 2 r 2 
is associated with departure of the regression from linearity. 

We may prove, for use in a later chapter, that the sum of squares 
V] n i (y i yY may be resolved into two separate sums. Thus 

i 



(18) 



the sum of products vanishing, since 

, - y) = 2 SA/y, - IJ) ft - S) = 



i 



in virtue of 27 (30). The sum in the first member of (18) is Ncr* myy 
or N?)*cr* by (10). The first sum on the right of (18) has just been 
proved equal to Na^(rj^ r 2 ); and the final sum has the value 
A> 2 <r 2 , in virtue of 27 (32). The equation (18) thus corresponds to 
the identity 

* y . (19) 



37. Continuous distributions 

The preceding proofs may be adapted to a continuous distribution 
by an appropriate change of notation, and a replacement of sums by 
definite integrals. The vertical array, whose abscissa is x y we assume 
to be of infinitesimal breadth dx. Then, if /(#, y} is the relative 
frequency density, the relative frequency for this array is 

![f(x,y)dx]dy=f 1 (x)dx, (20) 

where f^x) = f(x, y) dy, (21) 

the integration with respect to y extending over the whole of that 
array. The mean y x of the y's in the array is given by 

( 22 ) 
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and this is the equation of the curve of means of the vertical arrays. 
The mean of the whole distribution has coordinates 

x = xf(x,y)dxdy = xfy(x)dx, 

JJ J L (23) 

y = J j yf(x, y) dxd y = J y *A(*) fa>\ 

integration with respect to x including all the vertical arrays. The 
above relations correspond to the equations (1), (2) and (3) for a 
discrete distribution. 

The mean square deviation of the values of y, from the means 
of the arrays in which they lie, is given by 



* = JJ (? 



(?/ 



and 7j v is defined as before by an equation of the form (5) or (6). 
The variance of the weighted means of the vertical arrays is 



(25) 

and, as before, this may be expressed in terms of a^ and S'*. since 



- y,) + (y x - )?/( 

, (26) 

the integral of the product being equal to zero; and from this it 
follows, as in the case of a discrete distribution, that 

/j/ ~ "mi/l^y 

Formulae corresponding to those of 36 may be similarly esta- 
blished. For, by subtraction of (24) from the result in the Example 
of 26, 



J f [{(y - y x ) + (y x - F)}^ -(y- y x )*]f(x, y) dxdy 
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and this is the mean square deviation of the weighted means of 
arrays from the line of regression of y on x. And, corresponding to 
(18), we have the resolution 

[(T Jx - 7) + ( 7 - y)] 2 fi(x) dx 



the various integrals corresponding to terms of the identity 

38. Bivariate normal distribution 

We shall now consider briefly the continuous bivariate distribu- 
tion, which is a generalization of the normal distribution discussed 
in Chapter in. It may be introduced simply as follows.* Assume 
first that the variable x is normally distributed with S.D. cr^. Then, 
if the variable is measured from its mean, the probability that a 
random value of x will fall in the interval dx is 

,^ dx 



Assume next that the regression of y on a; is linear and homo- 
scedastic. Then, if cr 2 is the S.D. of y in the distribution, the common 
variance of the arrays of j/'s is or|(l /> 2 ), where p is the| coefficient 
of correlation between the variables. Finally, assume that each 
array of T/'S is normally distributed. Then, since the mean of each 
array is on the line of regression 

y = pxcrdcr l , (27) 

and the variance of each array is as stated, the probability that a 
value of y, taken at random in an assigned vertical array, will fall 
in the interval dy is 



dp _ _ 

2 ~ P 



T 
L 



* Cf. Rietz, 1927, 2, pp. 104-7. 

t In the theory of sampling, r refers to the sample, and o^, (7 2 , p to the 
population from which the sample is drawn. 
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By the theorem of compound probability, the chance of a pair of 
values (x, y) falling in the elementary rectangle dxdy is 

dxdy 



The probability density (j)(x, y) for the distribution is therefore 



- + - < 28 > 




FIG. 7 



Such a distribution is called a bivariate normal distribution, and the 
variables are said to be normally correlated. The surface z = <f)(x,y) 
is the normal correlation surface. Since (28) is of the same form in x 
as in y, we may conclude that the regression of x on y is also linear, 
the variance of each array of x's being <rf(l p 2 ). The values of x 
in each such array are normally distributed, with mean on the line 
of regression of x on y y whose equation is 



(29) 
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The m.g.f. for the distribution is found below,* and from it the first 
few moments are calculated. 

The nature of the normal correlation surface is indicated in the 
diagram. The curves along which the probability density (or the 
relative frequency density) is constant are the homothetic ellipses 



. A . {30 ) 

tri o^orj a\ 

With respect to these the line of regression of y on x is conjugate to 
the t/-axis. For the locus of the mid-points of chords x = const, is 
the straight line (27). Similarly the line of regression of x on y is 
conjugate to the ar-axis.f 

39. Intraclass correlation 

Let us now consider the correlation between the measures of 
some common characteristic for pairs of members of the same family 
or class. For example, we may be interested in the correlation 
between the weights of brothers, or the heights of sisters. The 
relation between two members of the same family is a reciprocal 
one; for, if P belongs to the same family as Q, then Q belongs to the 
same family as P. Each pair of members, P and Q, will therefore 
contribute two entries to the correlation table. In one of them x 
will be the measure of the characteristic for P, and y the measure 
for Q ; in the other, x will be the measure for Q, and y for P. The table 
will thus be symmetrical. 

We shall consider only the case in which each of the h families 
has the same number, k, of members. Then there are k(k 1) pairs 
of values for each family, and 

N = hk(k-l) (31) 

gives the total number of pairs of values in the table. Let x^ denote 
the measure of the characteristic for the jth member of the ith 
family. Thus the first subscript indicates the family, and the 
second the particular member of that family. Consequently i takes 
the values 1, 2, ..., A, andj the values 1, 2, ..., k. The z's and the y's 

* See Ex. V, 3. 

f For further properties of this distribution see Exx. 4, 6 and 6 at the 
end of this chapter. 

WMS 7 
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of the correlation table are the hk values x^ in different orders. Any 
one value x ijy occurring as an x in the table, will have as its y each 
of the other k 1 values for the same family. Thus each value x ti 
occurs k-\ times as an x; and the mean x for the bivariate dis- 
tribution is therefore 

(fc-l)SS** 

(32) 

and, since the table is symmetrical, y has the same value. The 
variance <r\ of the x's is the same as the variance of the hk values afy, 
since each of these occurs the same number of times in the dis- 
tribution. Thus 

< = ASS (*-*)' (33) 

/Ik i j 

and cr^ has the same value, in virtue of symmetry. We may denote 
this common variance by cr 2 , so that 

a* = o-l = o*. (34) 

The coefficient of intraclass correlation, r, is given by the usual 
formula, which in this case is equivalent to 

hk(k-l)<r*r = V 2 (%-*)(%-*) (35) 

i j i 



The sum is a triple one; for the product (x^ x) (x it x) is the pro- 
duct of the deviations from the mean for the jth and Ith members 
of the ith family, and the sum must include all such products for 
different values of j and /, and for all the families. We carry out first 
the summation with respect to I, observing that / takes all integral 
values from 1 to & except the value j. Thus the sum of the terms 
x u includes all values for the ith family except x t j, and may therefore 
be written kx { x t ^ where x % is the mean for the ith family. The sum 
of the terms x for the k - 1 values of / is (k - 1 ) x. The triple sum in 
(35) may therefore be written as the double sum 



= ^ 
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Now by (33) the last sum on the right is equal to fiber*. To evaluate 
the other we first carry out the summation with respect to j. Thus 



-# 

where o* m is the variance of the means of the families. 
Substituting these values in (35) we obtain 

hk(k- I)<r 2 r = h&o^-hka*. 

The coefficient of intraclass correlation is therefore given by the 
formula 



CURVED REGRESSION LINES 

40. Polynomial regression. Normal equations 

We have seen that the line of regression of y on x gives the best 
representation of the behaviour of y with change of x, that can be 
given by a straight line, the term 'best' indicating that the sum of 
the squares of the deviations (or 'residuals 1 ) is less for this straight 
line than for any other. But it is often apparent from the data that 
the regression of y on x is far from linear; and it may be desirable to 
find an equation of regression which affords a better representation 
of the behaviour of y than can be given by a straight line. The 
simplest type of non-linear regression equation is that in which 
one of the variables is expressed as a polynomial in the other. 
Polynomial regression of y on x is therefore represented by an equa- 
tion of the form 

(37) 



in which the coefficients b 8 are constants. Our problem is to deter- 
mine these constants so that the sum of the squares of the residuals 
is a minimum. The choice of the degree, fe, of the polynomial is at 
our disposal. If there are n different pairs of values in the distribu- 
tion, we can make the curve pass through all the representative 

r* 
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points by choosing k n- 1. But, to keep the arithmetical work 
reasonably simple, k must be fairly small, say 2, 3, or 4. The distribu- 
tion of points in the scatter diagram will frequently suggest the 
shape of the curve of regression, and thus a suitable value for k. 
Having decided on the value of k, we determine the coefficients 
b a by the method of least squares. 

With the notation of 23 suppose there are n different pairs of 
values, fa being the frequency of the pair (x t , y^), and the total 
frequency N being given by 

#=/. (38) 

il 

Let H t be the point on the curve (37) with abscissa x^ Then its 
ordinate Y i has the value 

7, = 60 + 6^ + ... +&**?= 2 b 8 xl (39) 

8-0 

and the deviation of the point P i from the curve of regression is 

fl^-fc-r,. (40) 

The sum of the squares of the deviations is 

r<)". (41) 



We have to choose the coefficients b s so that this sum is a minimum; 
and this is done by equating to zero the partial derivatives of S 2 
with respect to these coefficients. We thus obtain the hi- 1 normal 
equations 



SJ-O (42) 

i 

( = l,2,...,n;s = 0,l,2,...,fc). 

Written separately, and in terms of the values of the given dis- 
tribution, these are 



(42') 



40] Curve Fitting 101 

The coefficients b 8 are determined from these fc-t- 1 equations, which 
involve the sums of powers of x i from 2 x i to #!p, and sums of 

products from 2 J/i^S to S 2/^i ^ r the particular case fc = 1 these 
equations are the same as 25 (12). 

When the values x i correspond to equal increments, h, and the 
distribution of the frequencies /$ is symmetrical about x, the 
equations (42') may be much simplified. Suppose first that n is odd 
and equal to 2m 4- 1. Then, by taking the origin of x at the middle 
value of that variable, and the common increment h as unit of 
measurement, we have for the values of x in the distribution 

-w, -(m-1), ..., -1, 0, 1, ..., (w-1), ra. 

Hence, owing to the symmetry of the distribution of/ 7 s, the sums 
of the odd powers of x are all zero. The sums of the even powers 
may be written down from tables or algebraical formulae. If, 
however, n is even and equal to 2m, we take the origin of x at the 
mean of the middle pair of values, and \h as the new unit. The 
values of a: then become 

-(2m-l), ..., -3, -1, 1, 3, ..., (2m- 1), 

and the sums of the odd powers of x vanish as before. Moreover, 
the equations (42') then consist of two groups, one involving 
6 , 6 2 , ... and the other b l9 6 3 , .... Their solution is thus simplified. 

Example. Fit a parabolic curve of regression of y on x to the seven pairs 
of values 

xi 1-0 1-5 2-0 2-5 3-0 3-5 4-0 
yi 1-1 1-3 1-6 2-0 2-7 3-4 4-1 

The dot diagram of the seven pairs of values suggests a parabola. Since n 
is odd, and the values of x correspond to equal increments 0-5, we take the 
origin at the middle value 2-5 with 0-5 as the new unit. This is equivalent to 
the transformation 

u = 2x5. 

Each frequency /,- is unity. The sums of odd powers of u are zero, and the 
calculation may be arranged as in the accompanying table. The normal 
equations are 

16-2 = 7& 4-286 2 , 14-3 = 286,, 69-9 = 28& -hl966 a , 
which give immediately 

6 Q = 2-07, 6, = 0-511, 6, = 0-061. 
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The regression equation is therefore 

t/ = 2-07 + 0-511 u + O-OGlw* 

= 2-07 -I- 0-51 i(2a- 5) 4- 0-061(2#- 5) 3 , 
which simplifies to y = l-04-0-20a; + 0' 



X 


u 


y 


n* 


u* 


UIJ 


n*y 


1-0 


-3 


M 


9 


81 


-3-3 


9-9 


1-5 


-2 


1-3 


4 


16 


-2-6 


6-2 


2-0 


-1 


1-6 


1 


1 


-1-6 


1-6 


2-5 





2-0 














3-0 


1 


2-7 


1 


1 


2-7 


2-7 


3-5 


2 


3-4 


4 


16 


6-8 


13-6 


4-0 


3 


4-1 


9 


81 


12-3 


36-9 


Totals 





16-2 


28 


190 


14-3 


69-9 



41. Index of correlation 

When considering the line of regression of y on #, we saw that 
| r | is the coefficient of correlation between the values y t and their 
estimates Y i found from the regression equation, and that this 
coefficient is connected with the mean square deviation from the 
line of regression by the formula S 2 V = t7^(l r 2 ). We propose to 
show that a corresponding relation holds for polynomial regression. 
Multiplying the normal equations (42') by 6 , b l9 ..., b k respectively 
and adding we obtain, in virtue of (39), 

-Ji) = 0. (43) 



Consequently (41) is equivalent to 



i - i - i " *' 

From the first of the normal equations, H/i(y/-^) = 0, it follows 

i 

that the y's and the F's have the same mean. If this is taken as 
origin for both variables, their variances are given by 



and the coefficient of correlation, It, Between the y t and the lf i by 

1 i 

Ar i N i 



in virtue of (43), so that 



B<r v - <r r . 



(45) 
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Consequently (44) is equivalent to 

S* - 0-5(1 -JP), (46) 

which is analogous to 26 (21). 

The coefficient R is thus an indication of the closeness with which 
the points P t of the scatter diagram approximate to the regression 
curve (37). If R = 1, then S 2 = 0, and all the points P t lie on the 
curve. For this reason R is often referred to as the index of corre- 
lation for the regression curve (37). With each curve of regression 
there is associated such an index, given by 

=1-SX. (47) 

where S 2 is the mean square deviation of the points P i from the 
curve of regression. In virtue of (43) and (44) NS 2 in the case 
of polynomial regression is given by 

#^ = 2/^-2 s/Ayi*?. (48) 

i 8-Oi- 1 

Further, the relations (30) and (31) of 27 hold also for poly- 
nomial regression. For, in virtue of (43) and the first of the normal 
equations, we have 

0) = 0. (49) 



Then the sum of the squares of the deviations of the t/'s from their 
mean may be expressed 



yr, (so) 

in consequence of (49). This equation corresponds to the identity 



as is evident from (45) and (46). 

Example. Prove, as in 36, that afytf, /? 2 ) is the mean square deviation 
of the weighted means of the vertical arrays from the curve of regression. 
Also that, if n i is the frequency in the ith vertical array, 

2 n,(y< - y ) = S n.(y t - F<) 2 + n t ( F, - ), 
corresponding to the identity 
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42. Some related regressions 

A plotting of the values of x against the logarithms of the corre- 
sponding values of y may indicate an approximation to a linear 
relation between x and logy. In such a case we find a regression 
equation of the form 

y - ca x , (51) 

where c arid a are constants. For this relation is equivalent to 
log y = log c -f x log a. 

If then we find the line of regression of log y on x, the constants of 
the equation determine both c and a. 

Similarly, if the plotting of log x against log y indicates that the 
relation between these quantities approximates to linear, we find 
a regression equation of the form 

y = c3*, (52) 

which is equivalent to 

log y = log c -f 6 log x. 

We have therefore to find the line of regression of logy on logo;, and 
the constants of the equation determine c and b. 

We might also find a regression equation of the form 

y = ca* (x \ (53) 

where f(x) is a polynomial in x. For this is equivalent to 
logy = log c + (log a)/(a), 

and therefore requires a polynomial regression of log y on x. 

More generally, the principle of least squares may be employed 
to find a regression equation of the form* 



where X l9 ..., X p are any functions of the independent variable x. 
The argument used in 40 leads to the normal equations 



* Cf. Fisher, Annals of-Eugenics, vol. ix, part, 3, p. 238 (1939). 
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for the determination of the coefficients b 8 . Multiplying these 
equations by b v ..., b p respectively and adding, we find 



so that the sum of squares of the deviations from the regression 
curve is 



as in 41 . If one of the quantities X s is taken as unity (or a constant), 
the mean of the ?/'s is equal to that of the 7's. The argument of 41 
then applies to the present case, leading to (45), (46), (49) and (50). 
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EXAMPLES V 

1. The heights of brothers, in five families of three each, are 
as follows: (67, 68, 69), (68 ,68, 71), (68, 70, 72), (70, 70, 73) and 
(71, 72, 73) inches. Show that the mean heights for the families 
are 68, 69, 70, 71, 72 inches, and the general mean x = 70. Also 
<r 2 = 18/5, cr^ = 2, and the coefficient of intraclass correlation of 
heights for the five families is 1/3. 

2. Verify that, for the distribution of Ex. IV, 8, the correlation 
ratios are y x = 0695 and rj v = 0-685 approximately. 
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3. Moments and m.g.f. for the bivariate normal distribution. The 
moments may be deduced from the m.g.f. as defined in 31. To 
calculate this function let cr l and <r 2 be the S.D.'S of x and y in the 
distribution, and p the correlation between these variates. Then 
the m.g.f. is given by 



whcre 



By transforming the exponential we may write this: 
t < 2 ) = c f" f exp 

J-ooJ oo 



Carrying out the integration with respect to x, and rearranging the 
exponential, we have 



X 



which is the required m.g.f. 

To calculate the various moments we have only to expand this 
function in powers of t l and t 2 . The expansion is 



Since [i ra is the coefficient of t[t?Jrlsl in this expansion we have 



and so on, 

4. Show that the area of the ellipse, 38 (30), of constant prob- 
ability density is TrAV^g/^l /> 2 ); and hence that the area of the 
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strip between the ellipses corresponding to the parameter values 
A and A-f d\ is 27rAdA0* 1 er 2 / N /(l p 2 ). Deduce the probability 

exp[-A 2 /2(l-p 2 )]AdA/(l~/> 2 ) 

that a pair of values (x, y) chosen at random, will be represented by 
a point inside this strip; and hence by integration the probability 
that the point will fall inside the ellipse A is 



If this probability is i, then A 2 = 1-3863(1 -p 2 ). 

Also show that, for a given value of JA, the probability that the 
point (x,y) will fall in the strip between the ellipses A and A-f dX is 
a maximum when A 2 = 1 p 2 . This determines the 'ellipse of 
maximum probability'. (Cf. Rietz, 1027, 2, pp. 108-10.) 

5. The variates x and y are normally correlated, and , ?j are 

defiaed by . . 

= x cos u + ysinv, rj = y cos x sin u. 

Show that , 97 will be uncorrelated if 

tan 20 = 2pcr 1 (7 2 /(cri-(7 2 1 ). 

The above transformation corresponds to a rotation of rectangular 
axes of coordinates; and determines the directions of the principal 
axes of the ellipses of constant density. 

Show that, if , t] are thus uncorrelated, and cr|, cr 2 are their 
variances, then 

cr^ = oyr 2 V(l-p 8 ), <rf + cr 2 = crf + crl. 

6. Show that, if , ij are independent normal variates, and x, y 
are del *d by 

x = cos -f T/ sin 0, y = T/COS# sin#, 
the coefficient of correlation between # and y is given by (cf. Ex. 5) 



which is numerically greatest when = \n. The extreme values 
of r are (erf - <r*)/(crf + a*). 
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Also show that the points of inflexion of sections of the normal 
correlation sin face, by planes through the z-axis, lie on the elliptic 
cylinder &<r* + ij*o\ = <r\a*. (Cf. Yule, 1897, 1.) 

7, The profits, ?/, of a certain company in the xth year of its 
life are given by 

x: 1 2 3 4 5 

y: 1250 1400 1650 1950 2300 

Taking u = x 3 and v = (y 1650)/50, show that the parabolic 
regression of v on u is 



v + 0-086 = 
and deduce that the parabolic regression of y on x is 



8. Let x, y be normally correlated variatcs with zero means as 
in 38. Writing 



w = 



show that ~ ' = ? 



and 



1 /x 2 2/m/ 7/ 2 \ 
+ z 2 - T - 5 - - -^-^ + % . 
l-/>-\o-j cr^g cr 5 ; 



Deduce that the joint probability differential of to and z is 
dP = ^- exp [ - 1 (w 2 + z 2 )] dwdz, 

and hence that w, z ar^ independent normal variates, with zero 
means and unit S.D.'S. In other words w and z are independent 
standard normal variates. 



CHAPTER VI 
THEORY OF SIMPLE SAMPLING 

43. Random sampling from a population 

In order to examine a large population with respect to a specified 
characteristic, the statistician chooses a sample of individuals from 
that population and, from the properties of the sample relating to 
the given characteristic, he endeavours to estimate those of the 
population. Suppose that the characteristic considered is the height 
of the individual. Then the assemblage of heights of all the individuals 
in the population is called a population or universe of heights, and 
those of the individuals in the sample is a sample of heights from that 
population. Similarly, we might consider populations of weights, 
wages, yields of grain, etc. In the same way, if our consideration is 
the percentage of male births in a very large population of births, 
we may be obliged, in estimating this percentage, to confine our 
attention to the data provided by a sample of such births. The theory 
of sampling is concerned, first, with estimating the properties of the 
population from those of the sample, and secondly, with gauging 
the precision of the estimates, i.e. with ascertaining the deviations 
from the true values that may be expected in the estimates obtained. 

Fundamental to the theory is the concept of randojn sampling. 
This is defined by the property that, in the selection of an individual 
from the population, each member of the population has the same 
chance of being chosen.* Statisticians have developed techniques 
for ensuring, as far as possible, that their sampling is random; but 
the reader who wishes to study the details of these techniques must 
consult other works, f 

SAMPLING OF ATTRIBUTES 

44. Simple sampling of attributes 

In the sampling of attributes, as distinct from the sampling of 
values of a variable such as height, we are concerned only with the 

* See also Kendall, 1941, 2. 

t E.g. Yule and Kendall, 1937, 1, pp. 336-46. 
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possession or non -possession of some specified attribute or character- 
istic by the individual selected in sampling. For instance, in sampling 
from births we may be concerned only whether the baby is male or 
not. In sampling from a population of men our consideration may 
be whether they are smokers or non-smokers. The choosing of an 
individual in sampling may be called a 'trial', and the possession 
of the specified attribute by the individual selected a 'success*. 
Simple sampling is random sampling with the further provision that 
the probability p of success is the same at each trial. Thus p is a 
constant in the process; and the probability of success at any trial 
is independent of the success or failure of preceding trials. The value 
of p is the relative frequency of the occurrence of the attribute in 
the population from which the sample is drawn. Hence, for the 
sampling to be simple, either the population must be very large, or 
the individual selected must be returned to the population before 
the next trial, success or failure having been noted. 

The problem connected with the drawing of a simple sample of 
n members is thus identical with that of a series of n independent 
trials, with constant probability p of success; and the results of 
11 and 17 are applicable. The probabilities of 0, 1, 2, ... successes 
in a simple sample of n members are thus the terms of the binomial 
expansion of (7 + p) n . The binomial probability distribution thus 
determined is called the sampling distribution of the number of 
successes in the sample. The expected value, or mean value, of the 
number of successes is therefore np\ the variance is npq, and the 
standard deviation is <J(npq). This S.D. is usually called the standard 
error (S.E.) of the number of successes in a sample of size n, the 
deviation from the expected value np being looked upon as 'error*. 
The proportion of successes in a sample is obtained by dividing the 
number of successes by n. The expected value of the proportion of 
successes is therefore p\ and the S.D. of the proportion of successes 
is 1/n times that of the number of successes, i.e. *J(pq/n). Thus 

S.E. of the number of successes = <J(npq), 
S.E. of the proportion of successes = \/(pq/n). 

The precision of the proportion of successes observed in the sample 
is regarded as inversely proportional to the S.E. of this proportion. 



45] Sampling of Attributes 111 

Hence the precision of the observed proportion varies as <Jn. In 
particular, to double the precision it is necessary to increase the 
size of the sample four-fold. 

45. Large samples. Test of significance 

As we have just seen, the sampling distributions of the number 
and the proportion of successes in a simple sample of size n are 
binomial distributions. It was also shown in Chapter m that, for 
large values of n, the binomial distribution approximates to a 
normal distribution, in the sense that the probabilities for corre- 
sponding intervals in the two distributions tend to equality as n 
increases indefinitely. Now we know that the probability that a 
random value of a norinaLyariatg^. of S.D. or, will lie outside the 
interval which extends 3cr on each side of the mean is only 0*0027. 
Similarly, the probability that the value will deviate from the 
mean by more than 2cr is 0-0456, or about 4| %. We may therefore 
conclude that, for large values of n, the probability that the number 
of successes in a simple sample of n members will differ from the 
mean by more than three times the S.E. is also very small; and that 
a deviation of more than twice the S.E. is rather unusual. 

Bearing this in mind we have a test of the credibility of the 
hypothesis that a given large sample, of n members, was obtained 
by simple sampling from a population in which the relative frequency 
of the occurrence of the attribute considered is p. jSuppose it is 
found that the number of successes in the sample differs from the 
expected value np by more than 3j(npq). Then an event has hap- 
pened which, on the hypothesis of simple sampling, is very improb- 
able. We conclude then that the truth of this hypothesis is itself 
very improbable, and we say that the difference is highly significant. 
Considerations of the aspects of the problem will then lead us to 
suspect either that the value of p employed is incorrect, or else that 
the conditions of simpfe sampling were not observed. A deviation 
from the mean less than twice^the S.E. is regarded as nonsignificant. 
For deviations greater than twice tlie S.E. the significance increases 
with the deviation. The dividing line between significance and non- 
significance is, of course, not sharply defined. But significance is 
usually regarded as beginning where the probability of a larger 
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deviation is less than 5 %. It may be remarked that, while the 
above test may furnish evidence against the hypothesis, it cannot 
prove the hypothesis to be correct. The most it can do in its favour 
is to provide no evidence against it. 

The above argument still holds if, in place of the number of 
successes and its S.E., we employ the proportion of successes in the 
sample and its S.E. The expected value of this proportion in simple 
sampling isjp; and we compare the deviation of the actual proportion, 
from this value, with the S.E., ^(pqjn) 9 of this proportion. In some 
cases the value of p in the population is not known, but must be 
estimated from the sample. The estimate obtained from a large 
sample may be used without serious error in place of the true value, 
since the S.E. of the proportion of successes is small when n is large. 

Example. A certain cubical die was thrown 9,000 times, and a 5 or a 6 was 
obtained 3,240 tunes. On the assumption of random throwing, do the data 
indicate an unbiased die? 

On the hypothesis of an unbiased die the chance of throwing a 5 or a 6 is 
1/3. Tims p = 1/3 arid q = 2/3. The expected number of successes is therefore 
3,000, and the deviation of the actual number from this value is 240. The 
S.E., e, of the number of successes is 

e = J(npq) = v '(9000 x J x f ) = 10 V-0 = 44-72. 

The deviation 240 is nearly 5-4 times this S.E.; and it is therefore most un- 
likely to appear as a result of simple sampling with p = 1/3. We therefore 
conclude that the die is almost certainly biased, and that p is not equal to 1/3. 
The estimate of p obtained from the sample is 3,240/9,000 = 0-36. Tho 
S.E. of the proportion jof successes is then 

e' = ^(0-36 x 0-64/9,000) = 0-0050, 

and 3e' = 0-015. It is therefore most unlikely that the true value of p lies 
outside the range 0-30 0-0 15. In other words, the true value of p almost 
certainly lies between 0-345 and 0-375. 

46. Comparison of large samples 

Let two populations, P x and P 2 , be tested for the prevalence of a 
certain attribute, by taking from them large simple samples of n t 
and n 2 members respectively; and let p l and p 2 be the observed 
proportions of successes in the samples. Is the difference, p\p^ 
significant of a real difference between the two populations with 
respect to the given attribute? On the hypothesis that the popuia- 
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tions are similar in this respect, we may combine the samples to 
estimate the common value of the relative frequency of the occur- 
rence of the attribute in the populations. This estimate is then 

v 

The S.E.'S of the proportions of successes in samples of ttj and n 2 
members are ^(pq/n^ and J(pq!n 2 ) respectively; and, since the 
samples are independent, the variance e 2 of the difference of these 
proportions is given by 

(2) 



in consequence of the theorem proved in 10 and 15. 

On the assumption that the populations are similar, the expected 
value of the difference Pi p^ is zero, for 



The sampling distributions of p l and p 2 are approximately normal, 
when n t and n 2 are large; and the same is true of their difference 
since the samples are independent. Thus the distribution of/), p 2 
is approximately normal, with mean zero and S.D. e. The probability 
that, in simple sampling, the difference p\p^ will be numerically 
greater than 3e is therefore very small. The probability that it will 
be greater than 2e is in the neighbourhood of 5 %; and any value 
smaller than this is regarded as not significant, i.e. as providing no 
evidence against the hypothesis. 

Example 1. In a simple sample of 000 men from a certain large city, 400 
are found to be smokers. In one of 900 from another large city* 450 are 
smokers. Do the data indicate that the cities are significantly different with 
respect to the prevalence of smoking among men? 

Hero p l = 2/3, /? 2 = 1/2, so that p { /> 2 = 1/6. On the assumption that the 
cities are alike with respect to the prevalence of smoking among men, we have 
as our estimate of the common value of p, 

850 _ 17 
30' 



and the variance of the difference of the proportions for the two samples is 
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Hence e sr 0*026. The observed difference is greater than 6e, and is therefore 
highly significant. Our assumption that the populations are similar is 
therefore almost certainly wrong. 

Next suppose that simple samples, of % and n 2 members, are 
drawn from populations in which the proportions are p l and p 2 
respectively (p l >p 2 ). Is it likely that the proportions p[ and p 2 in 
the samples will be such that p[ -p 2 ^ 0; in other words, is the real 
difference between the populations likely to be hidden in sampling? 
As before, the distribution of p'ip' 2 is approximately normal for 
large values of n^ and n 2 ; but the mean of the distribution is now 
p l p 2 , and its variance, being the sum of the variances of p[ and 
pi, is given by 

6-Ml+M?. (3) 

n i r h 

In order that the sample value of p[p 2 should be negative, its 
deviation from the mean must be on the negative side, and numeri- 
cally greater than Pi p 2 . If p l p 2 > 3e, the probability of this is 
very small. If, however, Pi~-p 2 is much less than 2e, such an event 
would not be very unusual. For the particular case in which 
Pi~~Pz 2e, the probability of the event is in or near the interval 
2-2J %. 

Example 2. In two large populations there are 35 and 30 % of fair-haired 
people. Is the difference likely to be revealed by simple samples of 1,500 
and 1,000 respectively from the two populations? 

Hero p l = 0-35 and p 2 = 0-30, so that p l p 2 = 0-05. The variance of the 
difference of the proportions in the samples is 

=- 



so that G = 0-019. The difference Pi p^ is about 2-6e. The probability that 
the real difference between the populations will be hidden is approximately 
the probability that, for a random value of a normal variate, the deviation 
from the mean will be on the negative side and greater than 2-6 times the 
S.D. Since this is less than J %, it is unlikely that the difference will be hidden. 

47. Poissonian and Lexian sampling. Samples of varying size 

Consider next a few modifications of the condition of simple 
sampling, beginning with Poisson's series of trials (cf. 11, Ex. 3). 
Suppose that, in drawing a sample of attributes of n members, the 
chance of success changes at each drawing. Let p i be the probability 
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of success at the ith drawing. Then the expected value of the 
number of successes in the sample is the sum of the expectations 
at the individual drawings; and this is 



where p is the mean of the quantities p { . Also, since the drawings 
are independent, the variance e 2 of the number of successes in the 
sample is the sum of the variances of the numbers of successes at 
the separate drawings, so that 



If cr 2 is the variance of the quantities p it we may write this 

e 2 = np n(p 2 + (Tp) = npq ncr^ (4J 

Thus the variance of the number of successes is less than when the 
probability remains constant and equal to p. Dividing by n 2 we 
have the variance e' 2 of the proportion of successes in a Poissonian 
sample 2 

*-*?-3. (5) 

n n 

Consider next a Lexian series of trials.* Suppose that, in taking 
N simple samples of attributes of n members each, the probability 
of success varies from one sample to another. Let p f be the value 
in the ith sample (i = 1, 2, ..., N). We wish to find the S.E. of the 
number of successes per sample, when the records of all the samples 
are pooled. Let p be the mean value of the probability, so that 
Np = 2 Pi- Then the expected value of the number of successes in 
the whole series of samples is equal to the sum of the expected values 
for the individual samples, and this is 

2 np i = nNp. 

Hence the expected value of the number of successes per sample is 
np. To find the variance e 2 of the number of successes per sample, 
we observe that the mean square deviation from np in the ith 
sample is p,fc + (n ft -np). 

Summing for all the N samples, and equating to Ne 2 , we have 



Studied by W. Lexis in 1877. 

S-t 
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Now the last term has the value n 2 AV 2 , where cr\ is the variance of 
the quantities p t . In the other sum we may write 



Substituting these values in the equation, and dividing by JV, we 
obtain * (6) 



which is the required variance of the number of successes per sample. 
Dividing by n 2 we have the variance of the proportion of successes 

per sample, viz. } 

e'^M + TLJia* (7 ) 

n n p ^ 

Both the variances (0) and (7) are greater than in the case of simple 
sampling with constant probability p. 

Lastly, we may consider the modification of simple sampling in 
which the probability p of success remains constant, but the size n 
of the sample varies about a mean n with variance cr 2 . To find the 
variance of the number of successes per sample, we observe first 
that the mean number of successes per sample is np. Then, for 
samples of size n, the mean square deviation of the number of 
successes from np is npq. Consequently the mean square deviation 
from the general mean np is 

npq + (np - np) z . 

The expected value of this mean square is the required variance of 
the number of successes, so that 

e 2 = npq+p 2 a^. (8) 

We shall make use of this result in a later chapter. 

SAMPLING OF VALUES OF A VARIABLE 

48. Random and simple sampling 

We pass now to the consideration of sampling of values of a 
variable and measurable quantity, such as height, age, yield of 
grain, etc. Each member of the population of individuals, objects 
or experiments provides a value of the variable; and we thus have 
a population of values of the variable, and the frequency distribu- 
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tion determined by it. In drawing a sample of n members from the 
population we are choosing n values of the variable from those of 
the distribution. 

We have already defined random sampling as sampling in which 
each member of the population has the same chance of being chosen. 
In the case of a population of discrete values of the variable x, the 
value x { may occur f t times. There are then/ members of the popula- 
tion each equal to x t . Hence the probability of the value x i in the 
selection of an individual by random sampling is fJN, which is the 
relative frequency of that value in the population. Similarly, if the 
population of values of a; has a continuous distribution with relative 
frequency density /(x), the probability that, in the random selection 
of an individual, the value of the variable will fall in the interval 
dx isf(x)dx. Simple sampling is random sampling with the further 
provision that, in the selection of an individual from the population, 
the probability of obtaining a value of the variable within any specified 
range remains constant throughout the sampling. In particular, with 
a population of discrete values of the variable, the probability of ob- 
taining a specified value x i remains constant during the sampling. 
Thus, in simple sampling, the system of probabilities associated with 
any drawing is independent of the results of preceding drawings. 

A population, whose distribution is continuous, contains an 
infinite number of values in any finite interval in the range of the 
variable. In drawing a finite random sample from such a population, 
the probability associated with any interval remains unchanged, 
and the sampling is therefore simple. Thus a finite random sample 
from a population whose distribution is continuous is a simple sample. 
It is the common practice to refer to such a sample as a * random 
sample '. If, however, the population contains only a limited number 
of values, the sampling will not be simple unless each value selected 
is returned to the population before the drawing of the next value. 

49. Sampling distributions. Standard errors 

The distribution of the variable in the population has its mean, 
variance, moments of higher order, partition values, etc., which 
are spoken of generally as the parameters of the population. Simi- 
larly, each simple sample from the population determines a fre- 
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quency distribution of the variable, from which the mean, variance, 
etc., of the sample may be calculated. These may be regarded as 
estimates of the values of the parameters of the population. Any 
such estimate obtained from the sample is called a statistic. More 
generally any function of the sample values, used as an estimate of 
a parameter of the population, is called a statistic. 

When the distribution of the variable x in the population is 
known, it is theoretically possible to determine the probability 
that the estimate z of any parameter, obtained from a simple sample 
of n members, will lie in the interval dz. Let this probability be 
denoted by (f>(z)dz. Then the density <f)(z) determines a probability 
distribution called the sampling distribution of that statistic for 
simple samples of size n. Thus the sampling distribution is a con- 
tinuous distribution, determined by the nature of the population 
and the size of the sample; and the S.D. of the sampling distribution 
is called the standard error (S.E.) of that statistic for samples of n 
members. The sampling distribution is often defined as the dis- 
tribution of the values of the statistic obtained from an infinite 
(or very large) number of simple samples of the given size. This 
alternative way of looking at it may be a help to the student. The 
two definitions bear tho same relation to each other as the a priori 
definition of probability and the empirical definition. The reader 
should bear in mind that a sampling distribution is essentially a 
probability distribution. 

Distributions of statistics, for random samples from a normal 
population, will play a prominent part in later chapters. It was 
shown in 22 that the distribution of the means of such samples is 
normal; and we know its S.D. In general, for normal populations, or 
populations whose frequency curves are unimodal and only moder- 
ately skew, the sampling distributions of many of the common 
statistics approximate to the normal type as n increases indefinitely; 
so that, for large samples, they possess the property that the prob- 
ability of a sample value of the statistic deviating from its mean by 
more than three times its S.E. is very small. This enables us to apply 
the test of 45 to statistics obtained from large samples; and the 
determination of the S.E. in such cases is a matter of importance. 
Tests appropriate to small samples will be considered in Chapter x. 
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50. Sampling distribution of the mean 

The distribution of the means of random samples of given size, 
from a specified population, is a question of considerable importance. 
Let the sampled population have mean [i and variance cr 2 . We shall 
first prove by elementary methods that the sampling distribution 
of the mean x 9 of random samples of size n, has /i for its mean and 
a 2 /n for its variance. We shall then prove more generally how all 
the moments of the distribution of the mean may be deduced simply 
from those of the population, by means of the properties of the 
cumulative function. 

Reasoning in terms of a population of discrete values, let the rela- 
tive frequency of the value x t in the population bep t (i = 1, 2, ..., k). 
Then 



and (T 2 

In the case of a sample of one member, the distribution of the mean 
is clearly the distribution of the variable in the population. For the 
single value x s in the sample, associated with probability p 8 , is the 
mean of the sample. The expected value of the mean of the sample 
is therefore 

^OO = 2 P<*< = /*. 

Consequently the variance of the distribution of the mean of a 
sample of one is 



as stated. For a sample of n values Xj ( j 1, 2, ..., w), the mean x 
is given by 

n % = x j- 

Taking the expected value of each member we have 

E(nx) = E(% Xj) = 2 E(x s ) = /A = np, 
so that E(x) = ft. (11) 

Thus the mean of the population is the mean of the sampling distribu- 
tion of x. Further, since the values in the sample are independent, 
the sampling variance of their sum, nx, is the sum of the variances 
of the separate values. But each of these values is a sample of one 
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member, whose distribution has variance a* 2 . Consequently the 
variance of nx is ncr 2 , and its S.D. is cr^n. The S.D. of x is therefore 
cr/<Jn. This is the required S.E. of the mean. 

If the distribution of values in the population is continuous, with 
relative frequency density /(#), the relative frequency for the in- 
terval dx isf(x)dx; and this is the probability of a random value 
coming from that interval. To adapt the above argument we have 
only to replace p i by f(x) dx, x i by x, and summation with respect 
to i by integration extending over the values in the population. 
Thus (9) and (10) become 



= U 



and cr 2 = (x - /t) 2 f(x) dx. 



= \(x - 



Summation with respect to j remains unaltered, since it extends 
only over the n values of the sample. 

Example. A sample of 900 members is found to have a mean of 3-4 cm. 
Could it be reasonably regarded as a simple sample from a large population, 
whose mean is 3-25 cm. and S.D. 2-61 cm.? 

The S.E. of the mean of a simple sample of 900 from such a population is 
2-61/30 = 0-087 cm. The deviation of the mean of the sample from that of 
the population is 0-15 cm. This deviation is less than twice the S.E. of the 
mean, and is therefore not significant. We conclude that the given sample 
might be one drawn from the population specified. 

The various moments of the sampling distribution of x may be 
found very simply* by using the additive property of cumulants, 
proved in 16. For, since nx = S^> it follows from this property 
that the cumulative function of nx, being the sum of those of the 
variates x^ is given by 

K(t\ nx) = nK(t\ x) 



the cumulants K r being those of the population, and the second 
argument in brackets denoting the variate whose cumulative 
function is indicated. But the rth moment of x is obtained from that 

* Cf. Fisher, 1929, l.p. 202. 
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of rix by dividing by n r . Hence the cumulative function of x is found 
by substituting t/n for t in the above expansion. The cumulative 
function of the distribution of the mean is therefore 



The rth cumulant for the distribution of the mean of the sample 
is thus found from that of the population by dividing by n r ~ l . In 
particular, the mean of the distribution of x is K V and is therefore 
the same as the mean of the population. The variance of x is cr 2 /n, 
and the third moment about the mean of the distribution is / 3 /n 2 . 

That the sampling distribution of the mean is approximately 
normal for large samples, when the population has only moderate 
skewness and excess, also follows from the results of 16. For, 
since the rth cumulant of the distribution of the mean is obtained 
from that of the population by dividing by n f ~ l , the skewness of the 
distribution of the mean is 

K /n\ 3 / 2 1 

_J I _ I s- _ (skewness of the population). 

n W V 
Similarly, the excess of kurtosis of the distribution of the mean 

K 11* 1 

-| = - (excess of population). 

1 KQ Hi 

Thus, for large samples from a population of moderate skewness 
and excess, the skewness and excess of the distribution of x are 
small, and the distribution is approximately normal. 

51. Normal population. Fiducial limits for unknown mean 

Suppose that the population, from which the random sample of 
n values is drawn, is a normal population with mean ft and S.D. or. 
Then the sample values are independent normal variates, with the 
same mean and S.D.; and, by the theorem of 22, their mean x is 
normally distributed with mean /* and S.D. cr/Jn. If we know <r* 
but not /i, there is a range of possible values of fi for which the 
observed mean x of the sample is not significant at any specified 
level of probability. In the sampling distribution of the mean, the 
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relative deviation of x from its expected value is the ratio ofx-fi 
bo the S.E. of x\ and this ratio is (x-fi) Jn/or. If then the observed 
value x is not significant at the 5% level of probability, this 
relative deviation must be less than that which, in a normal dis- 
tribution, is exceeded numerically with a probability of 5 %. Such 
a relative deviation is 1-96. Consequently the observed mean x 
will be not significant provided 



which requires _ 
^ x~ 

The values x + l-OGer/^/n are called the 95 % fiducial limits, or con- 
fidence limits, for the mean of the population corresponding to the 
given sample. They are the limits within which /4 must lie, in order 
that the observed sample mean should not be significant at the 
prescribed level of probability. 

Similarly, we define the fiducial limits for other levels of prob- 
ability. Thus, in a normal distribution, the relative deviations 
which are exceeded numerically with probabilities of 2 and 1 % are 
2*33 and 2-58 respectively. Hence the 98 % fiducial limits for the 
mean of the normal population, corresponding to the given sample, 
are x + 2-33cr/ > /7i ; and the 99 % fiducial limits are x + 2- 



Example. Show that, in the example of the preceding section, if the 
population is normal but its mean unknown, the 95 % fiducial limits for the 
mean are 3-23 and 3-57 cm., and the 98 % fiducial limits 3-20 and 3-60 cm. 

52. Comparison of the means of two large samples 

Given two independent simple samples, of 7i x and n 2 members 
respectively, we may wish to examine whether the difference of 
their means may be accounted for by fluctuations of sampling, the 
two samples being regarded as drawn from the same population of 
S.D. or. The S.E.'S of the means of samples of n t and n 2 members 
from this population are cr/^Jn l and cr/Jn 2 respectively. Hence, on 
the assumption that the samples are independent and drawn from 
this population, the S.E. e of the difference of their means is given by 

(12) 
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The sampling distribution of this difference has zero for its mean, 
and is approximately normal if n l and n 2 are large. Consequently, 
if the observed difference of the means exceeds 3e, it can hardly be 
ascribed to fluctuations of sampling; and our assumption that the 
samples were drawn from the same population is almost certainly 
incorrect. If the difference is greater than 2e, it is regarded as 
significant at the 5 % level of probability. When the variance of the 
population is not known, it may be estimated from the combined 
sample of n^ + n 2 members, unless the variances of the two samples 
are inconsistent with the assumption that they were drawn from 
the same population (see 60 below). 

If the two samples are known to have come from different popu- 
lations, with variances ff\ and cr\ respectively, we can test by a 
similar procedure whether the two populations may have the same 
mean. The standard errors of the means of samples of n^ and n 2 
members from the two populations are o" l / x /n 1 and cr 2 /^//& 2 respec- 
tively; and the S.E. e of their difference is then given by 

e-^ + 3. (13) 

n l H 2 

On the assumption that the two populations have the same mean, 
the distribution of the difference of the means of the samples has 
zero for its expected value, and is approximately normal for large 
samples. We may therefore test the significance of the difference of 
the sample means, by comparing it with e in the usual manner. As 
before, if the variances of the populations are not known, they may 
be estimated from the large samples. 

Example. A simple sample of heights of 6,400 Englishmen has a mean of 
67-85 in. and a S.D. of 2-56 in., while a simple sample of heights of 1 ,600 Austra- 
lians has a mean of 68-55 and a S.D. of 2-52 in. Do the data indicate that 
Australians are on the average taller than Englishmen? 

The S.E. of the mean of a sample of heights of 6,400 Englishmen is 
2-56/80 = 0-032 in., and that of the mean of a sample of heights of 1,600 
Australians is 2-52/40 = 0-063 in. The S.E. of the difference of the means is 

e = V[(0'032) a + (0-063) 2 ] = V(O004993) = 0-07 in. 

The observed difference between the means of the samples is 0-70 in., which is 
10 times its S.E. Hence the data are inconsistent with the assumption that 
the means of the two populations are equal; and we conclude that Australians 
are on the average taller than Englishmen. 
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53. Standard error of a partition value 

Consider the S.E. of a partition value for a large random sample 
of n members, drawn from a continuous population in which the 
relative frequency density of the variable is f(x). Let the partition 
value be that for which p is the fraction of the frequency lying 
above it, and q the fraction below it. Also let x p be the partition 
value for the population, and x p + Sx p that for a large sample of n 
items. The relative frequency of values above x p in the population 
is p. In the sample the relative frequency above x p + Sx p is p, and 
above x p it is p + dp. Thus dp is the relative frequency in the sample 
for the interval Sx p . Now for a large sample Sx p and Sp are small; 
and Sp is, to within infinitesimals of higher order, the relative 
frequency for the interval Sx p in the population. Consequently 



where y is the ordinate at x p for the frequency curve y = f(x) of 
the population. On squaring we have 

(dp)* = y*(Sx p }\ 

Now y is independent of the sample, and Sx p and Sp vary about 
zero as mean. Hence, on taking expected values of each side, 
we obtain 



Now E(8x p Y is the variance of the sampling distribution of x p , and 
its square root is the S.E. of this partition value. Similarly, E(Sp) 2 
is the variance of the proportion of successes in n trials, with con- 
stant probability p that the value of the variable selected will be 
greater than x p \ and we know that this variance is pq/n. Con- 
sequently, on taking the square root of both sides of the above 
equation, we obtain the required S.E. e of the partition value* as 



If the value ofy for the population is not given, it may be estimated 
from the frequency distribution of the large sample. 

* See also Kendall, 1940, 4. 



53] Partition Values 125 

An important case is that in which the distribution of the variable 
in the population is normal, with S.D. <r. The value of p is the area 
under the normal curve to the right of the partition value; and the 
table of areas in 21 enables us to read off the value of x p /(r. The 
table of ordinates for the normal curve in 20 then gives the corre- 
sponding value of (jy\ and substitution of the values of y, p, q and n 
in (14) gives the required S.E. For example, in the case of the median 

p = i q = i aj/a = 0, cry = 0-3989. 

Hence the S.E. of the median, for a large sample of n members from 
a normal population, is 

5_ /i^i^j.25^- 

0- 3989V n Jn 

This is 25 % greater than the S.E. of the mean. For the upper 
quartile^ = i, the area from the mean to the quartile is 0-25, giving 

or/or = 0-6745, cry = 0-3178. 
The S.E. of a quartile is therefore 



0-3178 



A2ii-i.3o* 

V n *Jn 



Example. Prove in the same manner that, for large samples of n members 
from a normal population of S.D. (7, 

S.E. of 1st and 9bh deciles = \-l\ar j^Jn, 
S.E. of 2nd and 8th deciles = 1 '4307-^1, 
S.E. of 3rd and 7th deciles = l'32<r/^i, 

S.E. of 4th and 6th deciles = 1-210" j^Jn. 
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EXAMPLES VI 

1. A biased coin was thrown 400 times, and heads resulted 240 
times. Find the S.E. of the proportion of heads in 400 throws; and 
deduce that the probability of throwing heads in a single trial 
almost certainly lies between 0-53 and 0-67. 

2. A random sample of 500 pineapples was taken from a large 
consignment, and 65 were found to be bad. Show that the S.E. of 
the proportion of bad ones in a sample of this size is 0-01 5; and deduce 
that the percentage of bad pineapples in the consignment almost 
certainly lies between 8-5 and 17-5. 

3. In a random sample of 800 adults from the population of a 
certain large city, 600 are found to have dark hair. In a random 
sample of 1,000 adults from the inhabitants of another large city, 
700 are dark-haired. Show that the difference of the proportions of 
dark-haired people is nearly 2-4 times the S.E. of this difference for 
samples of the above sizes. 

4. In two large populations there are 30 and 25 % respectively 
of fair-haired people. Is this difference likely to be hidden in sam- 
ples of 1,200 and 900 respectively from the two populations? (The 
difference, 0-05, in the proportions is more than 2t times the S.E. 
of this difference for such samples. Hence it is unlikely that the real 
difference will be hidden.) 

5. Given that, on the average, 4 % of insured men of age 65 die 
within a year, and that 60 of a particular group of 1,000 such men 
died within a year, show that this group cannot be regarded as a 
representative sample, seeing that the actual deviation of the 
proportion of deaths is more than three times the S.E. of the pro- 
portion for samples of this size. 

6. In sampling of attributes a sample is drawn containing an even 
number n of members. In drawing each of the first \n members the 
probability of success is p, and for each of the remaining ones it is 
1 p, the drawings being independent. Show that the expected 
number of successes is %n, and the variance of the number of suc- 
cesses is npq. 
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7. The sampling distribution of a certain statistic is normal, 
with mean 2-0 and S.E. 1-5. Show that the probability that a simple 
sample will yield a value of the statistic greater than 5-0 is 0-02275; 
and the probability that the value found will lie outside the range 
-1-0 to 5-0 is 0-0455. 

Also find c if the probability of getting a value of the statistic 
greater than c is 0-07. (Ans. c = 4-213.) 

8. A normal population has a mean of 0-1 and a S.D. of 2-1. Find 
the probability that the mean of a simple sample of 900 members 
will be negative. (Ans. 0-077 nearly,) 

9. A sample of 900 members is found to have a mean of 3*47 cm. 
Can it be reasonably regarded as a simple sample from a large 
population with mean 3-23 cm. and S.D. 2-31 cm.? (Ans. No. The 
deviation x /i is more than three times the S.E. of the mean.) 

10. The means of simple samples of 1,000 and 2,000 are 67-5 
and 68-0 in. respectively. Can the samples be regarded as drawn 
from the same population of S.D. 2-5 in.? (Ans. No. The difference 
of the means is more than 5 times the S.E. of the difference, which 
is 0-097 in. nearly.) 

1 1 . Show that the S.E. of the 8th decile of a simple sample of 900 
members drawn from a normal population of S.D. 2-5 cm. is 0-12 cm. 
nearly. 

12. Prove that the coefficient of correlation between errors in 
the partition values corresponding to proportions p and p' of the 
frequency lying above them is, for large samples from a continuous 
population, ^(p'qlpq') in which p>p'. In particular, for the lower 
and upper quartiles this coefficient is 1/3. (Cf. Yule and Kendall. 
1937, 1, p. 385.) 

13. Using the value e = l-SOScr/^/n for the S.E. of the lower (or 
upper) quartile in a large sample of n values from a normal popula- 
tion of S.D. a*, find the S.E. of the semi-interquartile range. 

Since the interquartile range is the difference between the upper 
and lower quartiles, and the correlation between errors in these 
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statistics in a large sample is 1/3 (cf. Ex. 12), the S.E. of the inter- 
quartile range is, in virtue of 32 (40), 



The S.E. of the semi-interquartile range is therefore 
e/V3 = l-363cr/V(3n) = O7S7<r/>. 

14. The quantities x i (i = 1,2, ...,n) are independent values 
chosen at random, one from each of n populations with the same 
mean, and with variances erf. Show that the linear estimate of the 
common mean, which has the least sampling variance, is that 
obtained by weighting the #'s inversely as their variances. Also 
prove that this minimum variance is H/n, where H is the harmonic 
mean of the variances crj. (Cf. Ex. iv, 5.) 

15. From a finite population of N values, with variance <r 2 , a 
random sample of n values is drawn without replacements. Show 
that the sampling variance of the mean of the sample is 

(N-n)a*ln(N-l). 

16. Variable Poissonian sampling. The three types of sampling 
considered in 47 are all included in the following. Suppose that, in 
the drawing of a Poissonian sample, there are various types of 
sampling, each type having its own probability. Let w k be the 
probability of the Ath type of Poissonian sample, n k the size of a 
sample of this type, p kj (j = 1, ..., n k ), the probabilities of success 
at the individual drawings in this type, so that the expected number 
of sucesses in such a sample is x k = Z/>^. In virtue of 47 (4) the 



variance of the number of successes per sample in this type is 






where cr\ is the variance of the p k j for the Hh type of sampling. The 
expected number of successes per sample when all types are 
possible is 
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and the variance of the number of successes per sample for all types, 
being the mean square deviation of the number of successes per 
sample from x, is given by 



x k ----xl-n k (rl 

L n k 



where cr| is the variance of the x k for all types. Hence the general 
formula* 



In the particular case, for example, in which all the p k j have a 
common value, p, we have 



and the above formula becomes 

p* w k n k = 
fc 

which is 47 (8). 

17. In a certain population the proportion of members possessing 
a given characteristic is p. Prove that, if p may be varied, the 
probability of obtaining m such members in a simple sample of n 
from the population is greatest when p = mjn. 

18. Prove that, in simple sampling from a population, the 
expected value of the rth moment of the sample about any fixed 
value is equal to the rth moment of the population about that 
value. 

* Cf. Aitken, 1939, 1, pp. 53-4 and Coolidgo, 1925, 2, pp. 66-72. 



CHAPTER VII 
STANDARD ERRORS OF STATISTICS 

54. Notation. Variances of population and sample 

Having already considered the S.E.'S of the mean and the partition 
values, we now pass to those of various other statistics. We shall 
argue in terms of a population of discrete values x i (i = 1, 2, ..., 4), 
the relative frequency of the value x i in the population being p^. 
The argument covers approximately the case in which the values 
of the population are grouped in k classes of small class interval, 
the ith class being centred at the value x t , and having a relative 
frequency p t . The sum of the relative frequencies of all the other 
classes is therefore q it where as usual Pi + q^ = 1. Then, as in 50, 
the mean /JL of the population is given by 



(1) 

and its variance a* 2 by 

** = S !>*(< ~/O a ^ SP<*!-/A (2) 

Consider now a simple sample of n members drawn from this 
population. The sample values fall into the same classes as above; 
that is to say, the values x i are the same for the sample as for the 
population, but the relative frequencies of the classes vary with the 
sample, fluctuating about the corresponding values for the popula- 
tion. The probability that, in simple sampling, a value drawn will 
belong to the ith class is p^ Hence the frequency f i of that class, in 
a sample of n members, has mean value np {) since this is the expected 
value of the number of successes in n independent trials with con- 
stant probability^ of success. Thus 

E(fi) = n Pi . (3) 

The mean x of the sample is given by 



A 'I OSU 

and its variance S z by 

!-naa (5) 
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We may notice at the outset that the expected value of S 2 in 
sampling is not <r 2 but (n I)cr 2 /n. To prove this we observe that 
S 2 , being the variance of the sample, is the second moment of the 
sample about /, diminished by (x /j) 2 . Thus 



Consider the expectation of both members of this equation. In the 
first term on the right (a^/) remains constant during sampling; 
and therefore, in virtue of (3), the expected value of this term is 



Further, E(x ju,) 2 is the sampling variance of the mean, and is 
therefore equal to cr 2 /n. Substituting these values we obtain 



o*--* (6) 

as stated above. If then we write 



our result is E(s 2 ) = or*. (8) 

We say that s 2 is an unbiased estimate of or 2 , because its mean value 
in sampling is cr 2 . It is in this sense a better estimate of the popula- 
tion variance than S 2 . Incidentally also it follows from the above 
argument that the second moment of the sample, about the mean 
of the population, has or 2 for its mean value. 

55. Standard errors of class frequencies 

We have seen that, in simple sampling from the above population, 
the probability that a value will fall into the ith class of the sample 
is PI , and that the frequency fa of this class has mean value np^ 
The deviation of this frequency from its mean value will be denoted 
by Sfa. The S.E. of f i is the square root of the expected value of 
(3/i) 2 J an( i, since /$ is the number of successes in n trials of constant 
probability p it its variance is np^. Thus 

Pi ). (9) 

9-a 
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If Pi is unknown, an estimate of its value derived from the sample 
isfi/n] and, as an approximation to the above sampling variance, 
we have 

/</) (1) 



But this is a fair approximation only if n is large. A better estimate 
is obtained by multiplying this expression by nl(n-l). To prove 
this we consider the expected value of the second member of (10). 
Since E(f\) is the second moment of f t about zero in its sampling 
distribution, we have 

E(f\) = sampling variance of 

Consequently 

-/ JAO 



Hence the formula 



gives a better estimate than (10), since the expected value of the 
second member of (10') is equal to the first member, so that it is an 
unbiased estimate of the sampling variance of/^. For a large sample 
the factor n/(n 1) is nearly equal to unity, and (10) gives a suffi- 
ciently good estimate. 

56. Covariance of the frequencies in different classes 

Since the sum of the class frequencies in any sample is constant, 
for samples of given size n, it follows that 

o. (ii) 



The covariance between the frequencies /^ and/,, of the ith and jth 
classes is the covariance, E(Sf i Sf i ) 9 of their deviations 8f t and dfj 
from their mean values np t and npj. The calculation of this Co- 
variance may be associated with a correlation table in the values 
of 8f i and dfj. Consider the array in which Sf t has a fixed value. 
Bearing (11) in mind we assume that, in all the samples for which 
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dff has this fixed value, the excess <% produces an equal deficiency 
which is distributed among the other class frequencies, on the 
average, in proportion to their expected values; that is to say, we 
assume that, in the array with constant df it the average value 
of* is 



In calculating the covariance we may, as proved in 33, replace 
each value of Sfj in the array by the mean value 5/^ for that array. 
Then 

Z7/(%) a 

(13) 

in virtue of (9). This is the required covariance. 

If the values of p i and p^ are unknown, an approximation is 
obtained by taking them as fjn and fj/n. The second member of 
(13) is then replaced by fifj/n. A better estimate, however, is 
given by 




This may be shown by determining the expected value of the second 
member. Thus, in virtue of 23 (5), 



and therefore, by (13) and (3), 

E (fifi) = tfpipj-npipj = n(n- 
= -(n-l)E(Sf i Sf i ). 

Consequently, (14) gives an unbiased estimate of the covariance, 
since the expected value of the second member is equal to the co- 
variance. In the case of a large sample, the denominator n 1 may 
be treated as n. 

57. Standard errors in moments about a fixed value 

It is customary to employ Greek symbols for moments of the 
population, and the corresponding Italics for those of the sample. 
Thus/4 P and/t^ denote respectively the rth moment of the population 
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about the mean, and about some other specified value, while m r 
and m' r denote the corresponding moments of the sample. For the 
rth moment of the sample about x = we have 

nm' r = S/,35. 

Hence, owing to deviations <J/^ of the class frequencies from their 
mean values np {J we have a deviation 8m' r of the rth moment from 
its mean value, given by 

ndm' r = 

On squaring both sides we obtain 



where ' denotes summation over all integral values of i and j from 

ij 

I to &, except those for which i = j. Taking expected values of both 
members of this equation we deduce, since the values x i are the 
same for each sample, 



Consequently, on dividing by n and rearranging, 
nE(Sm' r )* = 






the moments ^ f and /i' r being those of the population. Hence the 
required formula 

0?). (15) 



71 



If, however, on taking expected values as above we use the estimates 
(10') and (14), we obtain the unbiased estimate of the sampling 
variance of m' r , 

; 2 ) (16) 



r - r 

tl> JL 

in terms of the moments of the sample. 
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Since the mean of a distribution is the first moment about x = 0, 
we may deduce the variance of the mean of a sample of n members 
by putting r = 1 in the above. Thus (15) becomes 



fb IV 

and (16) gives, in terms of sample moments, 



, 

in agreement with (8). 



(is) 



58. Covariance of moments of different orders about a fixed value 

The calculation of the co variance of the gth and rth moments of 
the sample about the fixed value x = is similar to the above. With 
the same notation we have 



and therefore nSm q = zf %, nSm' r = 
On multiplying corresponding sides of these equations we obtain 
n*dm' q Sm' r = 



Now take the expected value of each member of the equation. 
Then, in virtue of (9) and (13), 

n 2 E(8m' Q 8m' r ) = 



and consequently, on dividing by n and rearranging, 
nE(8m' q tm' r ) = Sft^-S^^-S 

= /4+r~ 

Hence the required formula 



^m f r )(^ r -^ r ) (19) 

in terms of the moments of the population. If, however, on taking 
expected values as above, we use the approximations (10') and (14), 
we obtain the unbiased estimate 

E(3m' q Sm' r ) ~ -L- (m' q + r - m' q m' r ) (20) 

Tl 1 

in terms of moments of the sample. 
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59. Standard errors of the variance and the standard deviation 
of a large sample 

The mean of a sample changes with the sample. Hence we cannot 
use (15) to obtain the sampling variance of the second moment 
about the mean of the sample. In the case of large samples we may 
obtain the required result as follows. The variance w 2 of the sample 
is connected with the second moment m' 2 about x = by the usual 
relation 

w 2 = m'% x 2 . 

Consequently Sm 2 = Sm 2 2xSx, 

and therefore, on squaring, 

(<Jm 2 )2 = (Sm'z) 2 - 4xSxSm' 2 + 4x 2 (Sx)* 9 (i) 

where *Z = -SM/<> *mJ = ~S?%- (*i) 

n i n i 

Suppose now that the origin ofx is taken to coincide with the mean 
of the population. Then x becomes identical with Sx, for it is now the 
deviation of the mean of the sample from the mean of the popula- 
tion. If we take expected values of both members of (i) we may 
show that, when n is large, the contributions of the second and third 
terms on the right are small compared with that of the first term. 
By (15), since the origin is now the mean of the population, the 
expected value of (Sm 2 ) 2 is (pn pDfn. Also x*(Sx) 2 is (5#) 4 , and its 
expected value is of the order 1/n 2 , in virtue of (17). Similarly, 
xSx8m% is (Sx) 2 8m'fr and its expected value is of the order 1/n 312 , in 
virtue of (15) and (17). If then n is large, the expectation of the 
first term on the right of (i) is large compared with those of the 
second and third terms, and we have the approximate formula* 

E(5m z Y = \(^-n\). (21) 

71 

* Fisher has shown that, for samples of any size, the exact sampling 
variance of s- (the unbiased estimate of population variance) is 

(/*4 ~ 3/*J)/n + 2//|/(n - 1) (Fisher, 1929, 1, p. 206). 
For a normal population this expression has the value 2(7 4 /(n 1). 
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The S.B. of the variance of the sample is the square root of this 
quantity. In the particular case of & normal population, with S.D. (7, 
we have / 4 = So* 4 and /i 2 = cr 2 ; so that, for large samples from a 

normal population. 

E(8m 2 )* = 2o*ln. (22) 

From (21) may be deduced the S.E. of the S.D. of large samples. 
For the variance of the sample we have, with the above notation, 

m = $ 2 and therefore 
2 <Jm 2 = 2SSS. 

For large samples the factor S may be taken as equal to the popula- 
tion parameter cr, dS being small. Hence on squaring, and taking 
expected values of both members, we have 



and therefore, in virtue of (21), 

(23) 



In the case of large samples from a normal population this is simply 

E(8S)* = ^, (24) 

and the S.E. of the S.D. is, in this case, 



60. Comparison of the standard deviations of two large samples 

For two independent large samples of % and n 2 members from 
the same population, the variance e 2 of the difference of their 
standard deviations, being the sum of the variances of their standard 
deviations, is in virtue of (23), 

(25) 

In particular, if the population is normal, this equation takes the 
simple form 



The expected value of the difference of the S.D.'S of two such 
samples is zero. Hence, by comparing the actual difference S l S 2 
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with the S.B. e, we may test in the usual manner the credibility of 
the hypothesis that the samples were drawn from the same popula- 
tion. 

For two large simple samples from different populations, whose 
moments are /* 4 , fi 2 and/Z 4 , /Z 2 respectively, the S.E. of the difference 
of their standard deviations is given by 

. (27) 



And in particular, if the populations are normal with standard 
deviations cr l and cr 2 respectively, 



Example. The S.D. of a simplo sample of 2,000 members is 5-9 years, and 
that of an independent sample of 2,500 members is 6-1 years. May the 
samples be reasonably regarded as from the same normal population ? 

On the hypothesis that they are from the same normal population, an 
approximate value for its S.D. is 6 years, and the S.E. of the difference of the 
standard deviations of samples of the above sizes is, by (26), 



= MrdW + soVo) = 6(0-0212) = 0-127 year. 

The actual difference of 0-2 year is less than l6e, and is therefore not signi- 
ficant at the 5 % level of probability. There is thus no real evidence against 
the hypothesis. 



SAMPLING FROM A BIVARIATE POPULATION 

61. Sampling covariance of the means of the variables 

Suppose now that the population is bivariate, with p i as relative 
frequency of the pair of values (x i9 y^, or of the class with centre 
at that point. The means of x and y in the population are then 



* ft = 

i i 



and the moments of orders q and r in x and y respectively are 



the first about the point (0, 0), and the second about the mean of 
the population. In particular // lfl is the covariance of x and y in 
the population. 
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In a simple sample of n pairs of values, let/ f be the frequency of 
the ith class. Then the values of the moments of the sample are 
given by 



the covariance of a; and y in the sample being m lfl . The argument 
of 55 and 56 still holds, and the formulae 

E(*fiY = np t (l - Pi ), E(8f i Sf i ) = -np iPi 

are still valid. 

Corresponding to (17) we may now prove that the covariance of 
the means x, y, in samples from a bivariate population, is /^i/n. For 



nx = S^, ny = 

Hence the deviations dx, Sy from /*, ft,' respectively, and the devia- 
tions 8fi from their mean values np t , are connected by 



and therefore on multiplication 



Taking the expected value of each member we deduce 
n*E(dxSy) = 



and therefore 

nE(8xSy) = 



Hence the required result 

E(8x8y) = /i lfl /n. (29) 

From this it follows immediately that the coefficient of correlation 
between x and y is equal to the correlation p between x and y in the 
population. For the correlation between x and y is 

covariance of x and y _ /^i,iM _ /*i,i 



(S.E. of#)(s.E. of y) ~" (c 
erf and cr| being the variances of x and y in the population. 
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We may also prove that the expected value of the covariance of 
x and y in the sample is given by 

< 30) 



which corresponds to (6). For, by the known relation, 23 (5), 

h.i = - s/<(**-/0(y<-X)-(*-/0(y -/*') 

TV 

Taking the expected value of each member we have, in virtue of (29), 



-^l.l 



as required. 

62. Variance and covariance of moments about a fixed point 

The moment of the sample about the fixed point (0, 0), of order 
q in x and r in y y is given by 



and therefore, with the usual notation, 



i 
Squaringboth sides, and takingexpectcd values as in 57, we deduce 

n 2 E(Sm'q r ) 2 = 2 %i q yl r npi(l ~Pi) ~~ S' x iyi x ]U^ n PiP^ 

i ' i,1 

and therefore 



in terms of the moments of the population. This is the sampling 
variance of m^ >r . If, in taking expected values, we use the approxi- 
mations (10') and (14), we obtain the unbiased estimate 

r) 2 ~:r K*,2r~<r) (32) 



in terms of the moments of the sample. 
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Similarly, we may find the sampling covariance of the moments 
m' Qtr and m' 8ti about (0, 0), by multiplying together the expressions 
for Sm' Qtr and m' 8ti and taking expected values of both sides. Pro- 
ceeding as above we obtain the result 



.,. (33) 

lit 

In particular by putting r = s = and q = t = 1, and observing 
that /4,o = I 1 ) /^o.i = X> we ^d ag a i n ^e covariance of the means 



, 

As in other cases we may deduce the unbiased estimate corre- 
sponding to (33), with sample moments and denominator n 1. 

63. Standard error of the covariance of a large sample 

By argument similar to that of 59 we may find the S.E. of the 
covariance of a large sample from a bivariate population. For, with 
the usual notation, since 



we have Sm lt l = Sm' lf l 

If now we take the origin at the mean of the population, x becomes 
identical with 8x and y with Sy. Consequently, if we retain only the 
principal terms we have, on squaring the last equation and taking 
expected values, 



and, since the origin is at the mean of the population, this is, in 
virtue of (31), 

lt tf = (ftt.-^O, (34) 



giving the square of the S.E. of the covariance of the sample. 

64. Standard error of the coefficient of correlation 

The S.E. of the coefficient of correlation, r, for large samples of 
n pairs from a bivariate population, may be found as follows.* 

* Cf. Bowley, 1920, 2, pp. 422-3. 
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Omitting the comma between the two subscripts in the moment 
symbols we have, by the usual formula, 

r = m n /V(m 20 m 02 ), 
and therefore, by logarithmic differentiation, 



Sr Sm n 1 8m 2Q 1 5ra 02 



(i) 



r m n 2 ra 20 2 m 02 

If the origin is taken at the mean of the population, so that x and 
y are identical with 8x and dy, and we retain only small quantities 
of the first order, we find as in the preceding section 

(Ki = Mi==- 
n 

Similarly, from the relations 



we find, on retaining only the principal terms, 



Now substitute these values in (i). Then, since n is large and we are 
retaining only small quantities of the first order, we may replace 
the denominators in (i) by the moments and correlation coefficient 
p for the population. Thus 



2 /W 



where F(x i ,y i ] denotes the expression in brackets. Squaring both 
sides we have 



and therefore, on taking expected values, 
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Now the second sum on the right is equal to zero. For 



On substituting the value of F^x^y^ we deduce the sampling 
variance of r in the form 



E(8r) 2 = 22 .| 40 -i- 4 _ 31 13 | 22 (35) 

' 







02 



This result assumes a very simple form in the case of a bivariate 
normal population. The parameter values are then 

/* 20 = erf, /to = *i 



7^40 

and on substituting these values we obtain 

1 

17VA-VA2 /I st2\2 

Jlj(OT) == 11 U I 

n 
Consequently s.E. ofr = ~~-. (36) 

These formulae hold for large values of n. Their application, 
however, is limited by the fact that the sampling distribution of r 
is not even approximately normal when r is fairly large. The S.B. 
test for r should therefore be applied only for large samples and 
moderate values of r. A much more useful test, which is applicable 
to small or large samples, will be considered in Chapter x. 
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EXAMPLES VII 

1. In a sample of 10,000 from a normal population the S.D. is 
2-52 cm. Show that the S.E. of this quantity is 0-018 cm. nearly, and 
hence that the S.D. of the population almost certainly lies between 
2-46 and 2-58 cm. 

2 . A random sample of 6,400 members from a certain population 
has a S.D. of 6-80 years, and a fourth moment of 3298 yr. 4 Show that 
three times the S.E. of the S.D. is 0-094 nearly, and hence that the 
S.D. of the population almost certainly lies between 6-7 and 6-9 years. 

3. The S.D. of a random sample of 1,000 members is 5-9 years, 
and that of an independent sample of 900 members is 6*1 years. 
Show that the samples may reasonably be regarded as drawn from 
equally variable normal populations, since the difference of their 
standard deviations is very little greater than its S.E. 

4. Standard error of the coefficient of variation. By definition this 
coefficient is V = 8/x = ^m z /x. Logarithmic differentiation gives 

SV _ Sm 2 Sx 
~^~ 



Squaring both sides and taking expected values show that, for 
large samples 



E(3V)* = 

v ' 



nx 



It can be shown that, for samples from a normal population, 
E(SxSm 2 ) = 0. Show that in this case the above result leads to 

S.E. of F= FV[(l + 2F 2 )/2rc]. 

5. Standard error of a moment about the mean of a large sample' 
Proceeding as in 59 we have 

nm q = S(&<-2)/< 

-S4f<-?zS 

and therefore 

Sm q = 8m q - q dxm^ - 
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If the origin is taken at the mean of the population, x becomes 
identical with Sx\ and, as we need retain only the terms of lowest 
order, we neglect those after the first two. On squaring we may 
write the result 



and on taking expected values of both sides we obtain, in virtue 
of (15) and (19), the origin being the mean of the population, 



The square root of this quantity is the S.E. of m q . 

6. Deduce that, for large samples from a normal population of 
S.D. <r, the S.E. 's of m 3 andm 4 arecr 3 *J(6/n) andcr 4 ^/(96/?i) respectively. 



7. A random sample of 3,600 pairs of values from a bivariate 
normal population showed a correlation coefficient of 0-45. Find 
limits to the correlation in the population. 

Since the sample is large we may take 0-45 as an approximate 
value for p. The S.E. of r is then (1 -0-2025)/60 = 0-0133, so that 
3e = 0-04 nearly. The coefficient of correlation in the population 
almost certainly lies between 0-41 and 0*49. 

8. A random sample of 2,500 pairs of values from a bivariate 
normal population showed a correlation of 0- 1 . Is this really signifi- 
cant of correlation in the population ? 

On the assumption that the population is uncorrelated we have 
the S.E. of r = 1/50 = 0-02. The actual value is 5 times this S.E., 
and is therefore significant. 

Show that the above correlation of 0-1 would not be significant 
in a sample of less than 400 pairs. 

9. By means of the result of Examples vi, 18 (p. 129) deduce* 
the formulae (15) of 57 and (19) of 58. 

* Cf. Kendall, 1943, 2, pp. 205-6. 



CHAPTER VIII 
BETA AND GAMMA DISTRIBUTIONS 

65. Beta and Gamma Functions 

For the benefit of the student who is not familiar with them, we 
shall first prove the elementary properties of the Beta and Gamma 
functions. The integral 



= (" e~ x x n ~ l dx (1) 

Jo 



converges if n is positive. It is a function of n called the Gamma 
Function. Clearly 

r(l) re~ x dx = 1. (2) 

Jo 

Also, if n 1 is positive, we have on integration by parts 



= \-e,- x x n ~*\ 4- ( (n-l)x n -*e- x dx, 



so that F(n) = (n - 1 ) F(n - 1 ). (3) 

Hence, if n is a positive integer, 

r(n) = (n-l)(n-2)...2.1.r(l) = (n-l)!. (4) 

On account of the property expressed by (3) and (4), F(n) is often- 
denoted by (n 1)1 whether n is integral or not. Also, on writing 
x 2 in place of a; in (1), we have the alternative formula 

r(n) = 2 fjr 2n - 1 exp(-a; 2 )da:. (5) 

Jo 

And, by an obvious substitution, it is easily verified that, if a is 
positive, 

/ 

= a~ n T(n). (6) 



/ 

Jo 
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The integral 

J5(m,n)= f x m - l (l-x) n - l dx (7) 

Jo 

also converges if m and n are positive. It is a function of m and n 
called the Beta Function. That it is symmetrical in m and n is 
easily shown by the substitution z = 1x. Then (7) becomes 

B(m,n) = f z n - l (l-z) m ~ l dz = JS(n,m). 
Jo 

Further, on substituting x = sin 2 in (7) we obtain 

f** 
.B(w, 7i) = 2 sin 2771 ' 1 cos 2 "- 1 #d#, (8) 

Jo 
and therefore in particular 

B(l I) = 2 f *"d0 = TT, (9) 

Jo 
while from (7) it is obvious that 

(1,1) = 1. (10) 

Lastly the substitution x = l/(l+y) in (7) leads to an important 
alternative definition, viz. 

*<>"> - "> 



and in this integral m and n may be interchanged, in virtue of the 
symmetry of the function. 

Example. Show that r 1^-1,^-1 

B(m,n)= --- -~-dx. 
Jo (1 + *)+* 



This may be deduced from (1 1) by dividing the range of integration into two 
$arts, to 1 and 1 to oo, and putting y = I/a? in the integration over the 
second part. 

66. Relation between the two functions 

That the Beta and Gamma functions are connected by the relation 



may be proved as follows. Consider the integrals 

L = 2 (a; 2 - 1 exp ( - x 2 ) dx, / 2 = 2 | ^n-i exp ( _ 
Jo Jo 
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whose limiting values as a tends to infinity are J'(m) and F(n) 
respectively, in virtue of (5). Then 



fa f 

= 4 dx\ 

Jo J 



exp ( - x 2 - y 2 ) X 2m ~ l y 2n -~ l dy 9 
o 

or, on changing to polar coordinates, 

7 X / 2 = 4 f I exp ( - r 2 ) r 2m + 2n ~ l cos 2 - 1 sin 271 - 1 G dr dO y 

the integration extending over the square OAGB (see Fig. 4, p. 66). 
Since the integrand is positive, this integral is intermediate in value 
between the integrals of the same function extended over the 
quadrant OA B, of radius a, and the quadrant OPQ, of radius a^/2. 
Hence j^/g lies in value between 



4 f * cos 2 - 1 6 sin 2 * 1 - 1 0d0 |*exp ( - r 2 ) 

Jo Jo 



and the corresponding integral with and a^j2 as the limits for r. 
But, as a tends to infinity, each of these integrals tends to the limit 
B(m,ri)F(m-}-ri), while /j and 7 2 tend to the limits F(m) and F(n) 
respectively. Consequently 

B(m,n)r(m + n) = F(m)F(n) 

as stated in (12). 

Putting m = w = | in this result we have, in virtue of (2) and (9), 



Consequently F(%) = Jn (13) 

r< e ~x 
or -T- dx = ^TT. 

Jo V^ 

By writing x 2 in place of x in this integral, or by putting n \ in 
(5), we deduce 



r 

Jo 



' o 

Example 1. Show that, if m is a positive integer, 

2m- I 2m- 3 
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Example 2. Show that, if m and n are positive integers, 



In particular, B(l,n) = 1/n. 
Example 3. Show that 

B(m + I,n)/B(m,n) = 
and (m + 2,n)/(w,n) = m(m+l)/(m + n)(m + n 

Example 4. Show that 

(w + 2,n-2)/(m,n) - w(w+ l)/(n- l)(n 
Example 6. Show that 



f a 

(a-x) m - l x' 
Jo 



- 1 a; f| - 1 (fcj = a*"- 1 B(m,n). 



67. Gamma distribution and Gamma variates 

In virtue of (1) a continuous variable x, which is distributed 
with probability density 



throughout the range to oo, is called a Gamma variate with para- 
meter l\ and its distribution is a Gamma distribution.* The factor 
l/P(l) ensures that the integral of <j)(x) over the whole range of 
values of x is unity. The reader should sketch the probability curve 
y = (f>(x) for the distribution. He will find that it is asymptotic to 
the #-axis and that, if I > 1 , it has a mode at x = I 1 . If I > 2 it also 
touches the x-axis at the origin ; while, if 1 < I < 2, it is tangent to the 
t/-axis at that point. If, however, < I < 1, the curve is asymptotic 
to both axes. 

The expected value of the variate in the distribution is given by 



r* 

(X) = 

J o 



The second moment about x = is similarly 

& = r x *<p(x)dx = r(l+2)/r(l) = 1(1+1). 
Jo 

* This belongs to Karl Pearson's Type III. SeeKendall, 1943, 2, pp. 137-43. 



150 B and F Distributions [vm 

Hence the variance is given by 

cr 2 = Z(Z-t-l)-Z 2 = Z. (17) 

Example. Show that the rth moment about x = is 



and deduce that the third moment about the mean is 21. 

An important example of a Gamma variate is associated with the 
normal distribution. If x is normally distributed with mean a and 
S.D. cr, the probability that a random value of the variate will fall 
in the interval dx is 



(18) 

Let u be defined by 

tt = !(z-a) 2 /o- 2 , 

so that, as x varies from oo to + oo, u varies from -f oo to and then 
from to -f- oo. For values of x between a and + oo, 

x a = crj(2u) 
and dx = crdu/<J(2u), 

,, 
so that 



But the probability that u falls in the interval du is double this, 
since there is an equal probability that u will fall in this interval 
when x lies between oo and a. Consequently the probability 
differential for the variate u is 



so that u is a Gamma variate with parameter J. We may therefore 
state 

THEOREM I. // x is normally distributed with mean a and standard 
deviation cr, then \(x a) 2 /<r 2 is a Gamma variate with parameter ^. 

A Gamma variate with parameter I may be referred to briefly as 
a y(l) variate, the symbol being used adjectively.* 

* The objection to describing it as a F(l) variate is the additional meaning 
thus given to the symbol F(l). 
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68. Sum of independent Gamma variates 

The moments of the Gamma distribution, and the distribution 
of the sum of independent Gamma variates, may be deduced from 
the moment generating function. The m.g.f. of a y(Z) variate with 
respect to the origin is given by 



-(I-')- 1 (|(|<1), (19) 

and the cumulative function is therefore 



) (20) 

Thus the mean of the distribution is Z, and the variance also Z, 
while the other cumulants are 



(21) 

Suppose now that x and y are independent Gamma variates with 
parameters I and m respectively. Then the m.g.f. of their sum, being 
equal to the product of their m.g.f. 's, is (1 t)~ (l + m) . But this is the 
m.g.f. of a y(l + m) variate. We thus have 

THEOREM II. The sum of two independent Gamma variates, with 
parameters I and m, is a Gamma variate with parameter l + m. 

The converse of this theorem is almost equally important. It 
may be stated: 

THEOREM III. // the sum of two independent positive variates is a 
Gamma variate with parameter l + m, and one of them is a Gamma 
variate with parameter I, then the other is a Gamma variate with 
parameter m. 

For, if M (t) denotes the m.g.f. of the last variate, we have, on 
equating the m.g.f. of the sum to the product of those of its com- 
ponents, 



whence M(t] = (1 -tf)~ m > 

and the second component is therefore a y(m) variate. 
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On account of the importance of these theorems we shall prove 
Theorem II from first principles. Let x and y be independent y(l) 
and y(ra) variates, and let z = x -f y. Suppose first that y has a fixed 
value in the interval dy. Then dz = dx. The probability that a random 
value of x will fall in the interval dx is 



and therefore the probability that, for a given value of y, z will lie 

in the interval dz is 

dp = 



But the chance that y will have a value in the interval dy is 
dp' = e-vy m - l dy/r(m). 

The probability that simultaneously z will lie in the interval dz 
and y in the interval dy is the product dpdp'. By integration we 
then have the probability that, for any value of y, z will lie in the 
interval dz as 



To evaluate the integral put y zt, so that dy zdt. Since z is the 
sum of the positive variates x and y, it is never less than y, so that 
t lies within the range to 1. Consequently 



-m~-\flv fl 

r ,4 a-o i 

l\m) Jo 



Zyl~\-fn 1 fl/7 

/V W'/v 

and z is therefore a y (Z + m) variate as stated. 

Repeated application of this theorem shows that the sum of n 
independent Gamma variates with parameters m i (i = 1, 2, ..., n) is 
a Gamma variate with parameter 2 m^ In particular, in virtue of 
Theorem I, we may state 

THEOREM IV. // x i (i = 1,2, ...,n) are n independent variates, 
normally distributed about a common mean zero with standard devia- 
tions (Tj, and x 2 S^f/^f? then \x* is a Gamma variate with para- 

meter kn. 
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69. Beta distribution of the first kind 

In virtue of the first definition, (7), of the Beta function we shall 
say that a continuous variate x, which is distributed with prob- 
ability density , l(l _ }m _ l 

-*-%- (22) 

throughout the range of values to 1 , is a Beta variate of the first 
kind with parameters I and m\ and its distribution is a Beta distribu- 
tion of the first kind.* Such a variate may be referred to briefly as a 
fi^l, m) variate. The reader should sketch the probability curve for 
the distribution, distinguishing the different cases. He will find that, 
if I and m are each greater than 1, there is a modal value 



If I > 2 the curve touches the #-axis at the origin; while, if 1 < I < 2, 
it is tangent to the y-axis at that point. If, however, < I < 1, the 
curve is asymptotic to the y-axis. Similar remarks hold for the 
shape of the curve near x = 1, according to the value of m. 
The mean value of x is given by 



TF(~\ _ I >M VJ " ^ f w *" _ ^ Vl/ ' *>""/ _ " /9Q\ 

( } ~ J o "~B(i^O~ " ~~B(l^T " ^H^ ' ( } 



The second moment /4 about x = is similarly 



From these it follows that the variance is 



_ Im 

~ )*(l + m+ 1)' ( ] 



Corresponding to Theorem II there is a fundamental theorem 
which may be stated: 

THEOREM V. // x and y are independent Gamma variates with 
parameters I and m respectively, the quotient x/(x + y) is a Beta variate 
of the first kind with parameters I and m. 

This may be proved from first principles. If we write 
x vz 

^ r tVlpn T y 

% =2 t-LLCll */ , 

x+y 1-2 

* This belongs to Karl Pearson's Type I. 
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and since x and y are both positive, the range of z is from to 1. 
Suppose first that y has a fixed value in the interval dy. Then 

dx = ydz/(l-z) 2 . 

The probability that, for this value of y, x will lie in the interval dx 
and therefore z in the interval dz is 

"* 

p 



But the chance that y will have a value in the interval dy is 



The probabihty that simultaneously 2 will lie in the interval dz and 
y in the interval dy is the product dp dp'. By integration we then find 
the probability that, for any value of y, z will lie in the interval dz as 



To evaluate the integral put y = (1 z). Then the limits for are 
and oo, and we obtain 

z?- l (l 

~ 

showing that z is a /? 1 (Z,m) variate as stated. 

We may remark in passing that, if z is a /^(Z, m) variate and 
v = 1 z, then v is a /^(w, Z) variate. For the range of v is also from 
to 1, and | dv | = \dz\. Expressing the probability differential for 
z in terms of v and dv, we obtain 

dP = v-*(l-v) l ~ l dvlB(l 9 m), 

which shows that v is a &(w, Z) variate. And the reader will observe 
that dP depends on the magnitude | dv | of the interval dv t but not 
upon the sign of dv/dz. 

70. Alternative proof of theorems 

A combined proof of Theorems II and V may be given more 
simply as follows.* As before let x and y be independent y(l) and 

* This proof is given by Sawkins, 1940, 3, p. 212. 
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y(m) variates respectively. Then the probability that a random 
value ofx will fall in the interval dx and, at the same time, a random 
value of y will fall in the interval dy is 



Now introduce the new variables 



so that 



x = uv y y = u(l v). 



Then as x and y range from to oo, it ranges from to oo and v from 
to 1. Also 

d(x,y) 



d(u,v) 

From the above expression for the probability differential dP it 
follows that the probability density in the joint distribution of 
x and y is 



But the area of the element of the xy-plane bounded by the curves 
along which u has the values u and u + du respectively, and those 
along which v has the values v and v + dv is 



~ 
c(u, v) 



T . , T 

duav = ududv. 



Hence the probability that, in a random selection of x and y, the 
representative point (x, y) will fall in this element of area is 



te , 

(25) 



Since this is the probability that simultaneously u will fall in the 
interval du and v in the interval dv it follows that these variates 
are independent, and that u is a y (I + m) variate and v a /^(Z, m) 
variate, as stated in the above theorems. 
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71 . Product of a $(1, m) variate and a ^(l+m) variate 

We may now prove an important property, suggested by Theorem 
V, which may be stated: 

THEOREM VI. The product ofafl^l, m) variate and an independent 
y(l -f m) variate is a y(l) variate. 

Let v be the &(Z,m) variate and u the y(l + m) variate, and let 
z = uv. The probability differential for v is 



Hence, for a fixed value of u in the interval du, the probability that 
z will fall in the interval dz is 



Multiply this by the probabihty that a random value of u will fall 
in the interval du, and integrate over the range of u from z to oo 
(since z^u). Thus we have the probability that, for any value of 
u, z will lie in the interval dz as 



/ v\ m - l dii 
i / i __ ~ \ ^ 

\ W ^ 



To evaluate the integral put u 2 = zt. Then ranges from to oo, 
and w T e have for the probability differential of z, 

(>-Z~l~-l ( l~ 

~ 



Consequently z is a y(l) variate as stated. 

72. Beta distribution of the second kind 

In agreement with the alternative definition of B(m, n) contained 
in (11) we may define a Beta variate of the second kind, with positive 
parameters I and m , as a continuous variate x which is distributed with 
probability density ,_ 1 

- m (27) 



throughout the range x = to x = oo. Such a variate may be referred 
to briefly as a /? 2 (Z, m) variate. Its distribution is a Beta distribution 
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of the second kind.* It is important to remember that, while the 
values of a Beta variate of the first kind are from to 1, those of a 
Beta variate of the second kind are from to oo. The reader should 
sketch the different forms of the probability curve for the latter 
distribution. He will observe a considerable resemblance to the 
case of a Gamma distribution. If I > 1 there is a mode at 

s = (Z-l)/(m + l). (28) 

The curve is asymptotic to the #-axis; and, if I > 2, it touches this 
axis also at the origin. If 1 < I < 2 the curve touches the /-axis at 
the origin; and if < I < 1 the curve is asymptotic to both axes. 
When m > 1 the mean value of the variate is given by 



(29) 
^ ' 



Also, if m > 2, the second moment about x = is 

x l + 1 dx _ B(l + 2,m-2) ___ 1(1+1) 



Consequently the variance is 

^ = 1(1+1) I 2 = l(l + m-l) 

(m l)(m 2) (m I) 2 (m l) 2 (m 2)" 



(30) 



We may observe that, if # is a Beta variate of the second kind, 
its reciprocal is a variate of the same kind with parameters inter- 
changed. For, if # is a /? a (Z, m) variate, its probability density is 
given by (27). If then we put x = 1/y, we have \dx\ = \dy |/y 2 ; 
and therefore, since y lies in the interval dy when x lies in the 
interval dx, the probability of this is 



Consequently y is a y? 2 ( m > variate. 

* This belongs to Karl Pearson's Type VI. 
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An important relation between the two kinds of Beta variates 
is expressed by 

THEOREM VII. To each Beta variate of the first kind corresponds a 
pair of Beta variates of the second kind; and conversely. 

Let v be a fi^l, m) variate, so that its probability differential is 



and let w be defined by ^ = lj(l+w) (32) 

or its equivalent w = (1 - v)/v. 

Then | dv \ ~ \dw |/(1 + w) 2 , and by substitution we have the prob- 
ability differential of w as 



Since the range of w is from to oo, it follows that w is a /? 2 ( m > 
variate. Its reciprocal is therefore a /? 2 (Z, m) variate, and the first 
part of the theorem is proved. 

Conversely, given that w is a /? 2 (m, I) variate with probability 
differential (33), we may define a variate v by means of (32). Sub- 
stitution then shows that the probability differential of v is given by 
(31); and since v ranges from to 1, it is a ^(l.m) variate. The 
variable 1 v is therefore a fli(m, I) variate, and the second part of 
the theorem is proved. 

73. Quotient of independent Gamma variates 

Corresponding to Theorem V, according to which a Beta variate 
of the first kind is determined by two independent Gamma variates, 
we have 

THEOREM VIII. The quotient of two independent Gamma variates, 
with parameters I and m, is a /? 2 (Z, m) variate. 

Let x and y be independent Gamma variates with parameters I 
and m respectively, and let v = y/(y -f x). Then, in virtue of Theorem 
V, v is a /?!(w, I) variate. But, if w is the quotient x/y, clearly 

1 1 

M\ =s _ __. _ 

l + x/y l + w 9 
so that w is a /# a (Z, m) variate as stated. 
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The theorem may also be proved by the method of 70, which 
leads to the additional information that the sum x + y is distributed 
independently of the quotient x/y. For if 



we have 
and therefore 


d(u,v) x + y (l+v) 2 


d(x, y) 
d(x,y) 


2/ a u 
u 


d(u, v) 


(1-f v) 2 ' 



Since the probability that x and y fall simultaneously in the 

intervals dx and dy is 

" 



~ r(l)T(m) ' 
the probability density for their joint distribution is 



Consequently the probability that, in a random choice of x and y, 
the representative point (x, y) will fall in the area bounded by the 
curves u, u + du, v,v + dvis 

e- u u l + m ~ l du rf~ l dv 

r(l + m) *B(l,m)(l + v) l + m ' 

Since tliis is the probability that u and v will fall simultaneously in 
the intervals du and dv respectively, it follows that u is a y(Z + m) 
variate, and v a /? 2 (^ m ) variate, and also that these variates are 
independent. 

Example 1. Cauchy's distribution. The distribution of the quotient of two 
independent standard normal variates follows from the above theorem. For 
if z is this quotient, z 2 is the quotient of two independent Gamma variates, 
each of parameter J. Consequently z 2 is a y# 2 ( J, ) variate, with probability 
differential 2 



The distribution of z follows immediately from this. For, since the range of 
z is from oo to +00 while that of z 2 is from to +00, the probability differ- 
ential of z is z 

fjsn *"~ .-.- 

? -77(l+ 2 <r 

the factor 2 disappearing, since the integral from oo to -f oo must be equal 
to unity. This distribution is associated with the name of Cauchy. 
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Example 2. Show that, if x and y are independent normal variates with 
means m l9 m z and variances crj, a\ respectively, the quotient 



conforms to the distribution 



with a range oo to +00. 
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EXAMPLES VIII 

1 . We shall see later (cf . 82) that, if r is the coefficient of corre- 
lation between the variates in a random sample of n pairs of values 
from an uncorrelated bivariate normal population, then r 2 is a 
Pi(\,\(n %)) variate. Hence show that the mean value of r 2 for 
such samples is l/(n 1), and the S.E. of r therefore l/*J(n 1). Also 
show that the probability differential of r is 



2. Quotient of independent Gamma variates. Theorem VIII may 
be proved by direct integration, as in the cases of Theorems II, V 
and VI. Let z = x/y, where x and y are independent y(l) and y(m) 
variates. Then for a fixed value of y in the interval dy, dx = ydz, 
and the probability that z will lie in the interval dz is 



Consequently the probability that, for any value of y, z will lie in 
the interval dz is 



/00 

M 2 

Jo 



ra) 



showing that z is a /? 2 (Z, m) variate as required. 
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3. Show that, for the y(Z) distribution, 

(mean mode)/cr = l/Jl = ^ 3 /<J 3 . 

Hence some writers prefer /* 3 /cr 3 to / 3 /cr 3 as a definition of skewness. 
Show that the excess of kurtosis of the distribution is 6/L 

4. Show that the mean value of the positive square root of a 
y(l) variate is r(l + ^)/F(l). Hence prove that the mean deviation of 



a normal variate from its mean isor 

5. Prove that the mean value of the positive square root of a 
/?!(Z,m) variate is r(l + %)r(l + m)ir(l)r(l + m + %). Hence, using 
the distribution of r 2 given in Ex. 1, for samples from an uncorre- 
lated bivariate normal population, show that the mean value of 



6 . A simple sample of n values is drawn from a population with a 
distribution. Show that, if x is the mean of the sample, nx is 
a y(nl) variate; and deduce that E(x) = i, and that the sampling 
variance of x is l/n. 

7. A simple sample of n values is drawn from a population with 
the exponential distribution whose probability density is ae~ ax , 
(Q^x). Show that, if x is the mean of the sample, nax is a y(ri) 
variate; and deduce that E(x) = I/a, and that the S.B. of x is l/(a *Jri). 

8. Show that, if v is the square of a y(Z) variate, its probability 
differential is dp = 0W-%-^cfo/2.r(0. 

9 . Show that, if x i (i = 1 , . . . , n) are n independent Gamma variates 
with parameters m i9 any homogeneous function F(x l ,... J x n ) of 
these variates, of degree 0, is distributed independently of the sum 
S*<- (Cf. Pitman, 1937, 8, pp. 216-17.) 

i 

This may be done by considering the m.g.f. of the simultaneous 
distribution of F and 2 #< As explained in 31 this m.g.f. 

M(t l9 t 2 ) = tf(exp(* 1 

poo /*o 

= C\ ... 

Jo Jo 

x exp (^ ^x i -\-t 2t F) dx l . . . dx n , 

where 1 \C = F(w 1 ) . . . F(m n ) . 



WM9 
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Substituting x^l-tj = t/^ we have 



. . . P 
Jo 



exp ( - 



since jP is homogeneous of degree zero. Thus 

M(t ly 2 ) - (function of tj (function of t 2 ) 9 
and the variates F and S ^ ar ^ therefore independent. 

10. Given that the incomplete Beta function B x (l, m) is defined by 



o 
and that I x (l, m) = B x (l, m)/B(l, m), prove the relations 



and 

11. Simultaneous sampling distribution of the mean and the 
variance. The independence of the sampling distributions of the 
mean x and the variance $ 2 , of a random sample of n values from a 
normal population, was proved by Fisher from the simultaneous 
sampling distribution of these statistics. The probability that the 
n values of the sample will fall in the respective intervals dx l9 . . . , dx n 
is, by the theorem of compound probability, 



dp = (0-V(27r))^ex P r-;(^ 

the n values being chosen independently from the normal population 
of mean ft, and variance <r 2 . But 



so that 
dp - (<rV(27r))~"exp [- w(-/*)2/2<r 2 ]exp (-nS*l2cr*)dx l ...dx n . 
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Fisher proved by geometrical reasoning* that this probability 
differential is expressible as 

dp = C exp [ - n(x - /*>)*/ 2<r 2 ] dx exp ( - 7i/S 2 /20- 2 ) S n ~ 2 dS, 
where C is a constant. Since this is of the form 
dp = <fii(x) dx . <f> 2 (S) dS, 

the distributions of x and 8 are independent. The forms of <f>i(x) 
and ^ 2 ($) show that x is normally distributed, with mean fi and 
variance <r 2 /n, and that ^nS 2 /cr 2 is a Gamma variate with para- 
meter \(n 1). 

Another proof of the independence of x and S 2 will be given in 77 . 

12. Show that the rth moment of a ^(l,m) distribution about 
x is 



13. Show that, if r is less than m, the rth moment of a 
distribution about x = is 



14. Defining the harmonic mean (H.M.) of a variate x as the 
reciprocal of the expected value of l/x show that, if Z > 1, the H.M. 
of a y(l) variate is Z 1, that of a /^(Z, m) variate is (Z 1)/(Z + m 1), 
and that of a /? 2 (^ m ) variate is (Z l)/m, m being positive. 

* Fisher, 1925, 1, pp. 92-3. 



CHAPTER IX 
CHI-SQUARE AND SOME APPLICATIONS 

74. Chi-square and its distribution 

We have already seen ( 68, Theorem IV) that if x i (i = 1, 2, ...,n) 
are n independent variates normally distributed about a common 
mean zero with standard deviations cr v , and 

X* ^Zxlfcrl (1) 

i 

then lx* i s a Gamma variate with parameter \n. The distribution 
of x 2 is therefore 

dp = (^ 2 ) i(n ~ 2)ex P(-^ 2 )^(k 2 ), (2) 



which expresses the probability that the value of x 2 found from a 
random sample will fall in the interval dx 2 . This distribution is 
often referred to as the x 2 distribution, and a variate conforming to 
it is said to be distributed like x 2 - Since x 2 is twice a y(\ri) variate, 
its mean value is n and its modal value ft 2, as proved in 67. 
Similarly, the variance of x 2 is 2n. 

The distribution (2) was discovered by Helmert in 1875, and 
rediscovered independently in 1900 by Karl Pearson, who devised 
by means of it the x 2 test of 'goodness of fit' which will soon be 
considered. We must first, however, examine the effect of one or 
more linear relations between the variates x\ and, to make this 
clearer, we shall digress briefly to remind the reader of the pro- 
perties of an orthogonal linear transformation. 

75. Orthogonal linear transformation 

Let the n variates x i be subjected to the linear transformation 

frj = l,...,n). (3) 



If the constant coefficients G are such that 



(4) 



74, 75] Orthogonal Transformation 165 

the transformation is said to be orthogonal. In this case the coeffi- 
cients satisfy the relations 

Scfc-l-Scfc (5) 

and S CyC ik = = 2 c y c w (j 4= k). (6) 

These relations may be expressed verbally by saying that, in the 
determinant | c^ \ of the coefficients, the sum of the squares of the 
elements in any row or in any column is equal to unity, while the 
sum of the products of corresponding elements in any two rows or 
in any two columns is equal to zero. The determinant of the coeffi- 
cients is equal to 1 ; and consequently the Jacobian of the 's 
with respect to the #'s is also 1. By changing the sign of one of 
the 's, if necessary, we may ensure that the value is 4- 1 ; and we 
shall assume in all cases that this has been done. It follows that 



It is easy to prove that, if the #'s are statistically independent 
variates, normally distributed about zero with unit S.D., so are 
the 's. For, the probability that simultaneously the n values of 
the variates x i will fall in the respective intervals dx t is the product 
of the probabilities for the individual variates, and is therefore 
given by 

dP = (2n)-* n exp ( - \ V x \ } dx^x 2 . . . dx n . 



The probability density for the joint distribution of the variates 
x i is therefore 

(2*)-* exp ( - 2 xt) = (277)-*- exp ( - J j). 



But the 'volume* of the element bounded by the n pairs of hyper- 
surfaces , + d is 
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so that the probability that the 's will fall simultaneously in their 
respective intervals dg t is expressible as 

dP = (2*)- exp ( - if) ^i - (2*)-* exp ( - *) d ft . 

From the form of this expression it follows that the 's are statisti- 
cally independent, and are normally distributed about zero with 
unit S.D. 

The transformation (3) corresponds to a rotation of rectangular 
axes in Euclidean space of n dimensions. In the choice of the 
coefficients c^ there is a degree of arbitrariness. For instance the 
coefficients for t may be chosen first, subject only to the condition 



which ensures that the constants c u are the components of a unit 
vector. When ^ has been determined 2 may be chosen in an infinity 
of ways, subject only to the conditions 



which express that c 2i are components of a unit vector, orthogonal 
to the vector whose components are c lt . At each step there is freedom 
of choice of an axis orthogonal to those already chosen; and only 
in the case of the nth axis is there no freedom of choice. 

76. Linear constraints. Degrees of freedom 

As above let the variates x t be normally distributed about zero 
as mean, with unit S.D. To begin with we assume that they are 
functionally independent; but presently we shall impose on them 
the linear restriction 

ff'iZi + a 2 2 -f . . . + a n x n = 0, (8) 

an equation which may be divided throughout by the constant 
necessary to make X a l 1 ; ^ nc l we assume that this has been done. 

i 

If the variates x i are independent, we may obtain by an orthogonal 
linear transformation a set of n statistically independent variates 
i9 each of which is normally distributed about zero as mean, with 
unit S.D.; and this may be done so that x is identical with the first 
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member of (8). Now let the condition (8) be imposed on the #'s. 
This is equivalent to putting x = 0; and, since the variates 2 , . . ., , n 
are statistically independent of 1? their distributions are unaltered 
by the condition so imposed. Consequently 



i 1 2 t-2 

where the w^ are Gamma variates, each with parameter \. Thus |% 2 
is the sum of n 1 independent y(J) variates, and is therefore itself 
a Gamma variate with parameter i(n 1), The distribution of % 2 
is therefore obtained from (2) by putting n 1 in place of n. The 
condition (8) has reduced the number of independent variates by 
one. The number of independent variates is usually called the 
number of degrees of freedom (D.F.), or briefly the number of freedoms. 
The term is borrowed from geometry and mechanics, where the 
position of a point or of a body is specified by a number of functionally 
independent variables called coordinates. Each independent co- 
ordinate corresponds to one degree of freedom of movement. Any 
constraint on the body reduces the number of degrees of freedom. 
For this reason a linear relation between the variates x i is called a 
linear constraint. We shall assume that only linear constraints are 
involved. 

An appeal to geometry throws light on the above reasoning. If 
the variables x { are regarded as rectangular Cartesian coordinates 
of the current point P in Euclidean space of n dimensions, the 
square of the distance of P from the origin is 



and, in terms of the alternative set z - of coordinates relative to 
rectangular axes through the same origin, 



When the variables are connected by the equation (8), the point P 
is constrained to lie on a hyperplane through the origin, determined 
by this equation. In terms of the 's the equation of the hyperplane 
is simply 
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and for a point P on this hyperplane we have 



as above. Thus the linear constraint (8) restricts the freedom of 
movement of P to the hyperplane, in which there are only n 1 
independent coordinates. x 2 is then said to correspond ton I D.P. 
That each linear constraint reduces the number of freedoms by 
unity may be shown in a similar manner. Thus if there is a second 
linear constraint 

&i* 1 + &i* a + ...-f&; i a tt = 0, (9) 

it is expressible in terms of the 's in the form 

6,, + fr.l>+... + 6S=0, (10) 

n 

in which 2&| = 1. We obtain x 2 as the sum of squares of n- I 
2 

standard normal variates , (i = 2, ...,n), which are now connected 
by the linear constraint (10). Proceeding as before by an appropriate 
orthogonal transformation 



in which r/ 2 is identical with the first member of (10), we may 
express x 2 a s 

X 2 = S 1\, 



and \x 2 * s th us a Gamma variate with parameter \(n 2), so that 
X 2 corresponds to n 2 D.F. The argument holds for any number of 
linear constraints. Following Yule and Kendall* we shall denote 
the number of freedoms by v. Then, if the n variates are subject to 
m linear constraints, 

v = n-w. (11) 

In place of (2) we thus have for the distribution of x 2 corresponding 
to v D.F., 

dP = 2*^W A ^^ 

* 1937, l,p. 415. 
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Since %x 2 is a y(\v) variate, the mean value of x* is v and its modal 
value is v 2. The probability that the value of x 2 from a random 
sample will not exceed a fixed value x 2 is obtained by integrating 
(12) with respect to x 2 from to x 2 - Similarly, the probability that 
X 2 will exceed XQ is the integral* of (12) with respect to x 2 from xl 
to oo. The accompanying Table 3 gives the values of x 2 > for values 
of v from 1 to 30, and for various fixed values of the probability P 
of exceeding x 2 * In other words, P is the probability that in random 
sampling the value of x 2 shown in the body of the table will be 
exceeded. 

Example. Show that, for 2 D.F., the probability P of a value of x 2 greater 
than xl is exp ( kxl)> an(i hence that xl = 2 log fl 1/P. 



77. Distribution of the 'sum of squares' for a random sample 
from a normal population 

Consider a random sample of n independent values x i from a 
normal population of variance or 2 . If as usual x denotes the mean 
of the sample, the sum of squares of the deviations from the mean is 



This is the 'sum of squares' to be considered; and we shall prove 
that nS 2 /or 2 is distributed like x 2 for n- 1 D.F. We know that 

Si = 2 (Xi-x) 2 + nx* = nS* + nx*. (14) 

i i 

If then we introduce an orthogonal linear transformation (3) of the 
variables x i suchf that x = xjn, we have by (14) 



and therefore n/S 2 /^ 2 = 2 Sf/o- 2 . (15) 

i-2 

Now, with origin at the mean of the population, the x's are normally 
distributed about zero as mean with S.D. cr, and so also are the 's. 

* For a method evaluating this integral see Fisher, 1935, 1, pp. 356-7. 
See also Ex. ix, 9. 

t Cf. Sawkins, 1940, 3, p. 225. 
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Consequently Jra$ 2 /0- 2 is a Gamma variate with parameter \(n - 1), 



and its distribution is 



In other words nS^/or 2 is distributed like x 2 with n I D.F. 

It is convenient to express the result in terms of the statistic s 2 
of 54, which is an unbiased estimate of the variance of the popula- 
tion. Thus nS = (n-l)a* = w*, 

and the distribution of s 2 is therefore 



the estimate s 2 being based on v D.F. 

The coefficient of ds 2 in (16) is the probability density in the 
distribution of s 2 . Suppose that the population variance is unknown, 
and that we enquire what value of it would make the probability 
density of s 2 a maximum for the given value of s 2 . This is obtained 
by equating to zero its derivative with respect to a 2 . The procedure 
leads to a* 2 = s 2 . For this reason s 2 is called the optimum value of 
the population variance corresponding to the given sample; and the 
method of obtaining it is called the method of maximum likelihood. 
It is due to R. A. Fisher. 

In the above argument ^ is independent of 2 , ..., n and there- 
fore, in virtue of (15), x is independent of S 2 . Thus the sampling 
distributions of the mean and the variance are independent. And, 
since ^/cr is a standard normal variate, so is xjn/cr. It follows that 
x is normally distributed with the same mean as the population, 
and with variance cr 2 /n. 

78. Nature of the chi -square test. An illustration 

The x 2 test is a means of judging the credibility of an hypothesis 
concerning the population (or populations) from which the values 
of the sample (or samples) are drawn. The hypothesis to be tested 
must be of such a nature that we can determine, from the sample 
values of the variate, the corresponding value of a certain statistic, 
which is distributed like ^ 2 for a known number of freedoms. The 
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TABLE 3. Values of % 2 with probability P 
briny exceeded in random sampling 

v ri amber of degrees of freedom 



\ p 

V \ 


0-99 


0-95 


0-50 


0-30 


0-20 


0-10 


0-05 


0-01 


1 


0-0002 


0-004 


0-46 


1-07 


1-64 


2-71 


3-84 


6-64 


2 


0-020 


0-103 


1-39 


2-41 


3-22 


4-60 


5-99 


9-21 


3 


0-115 


0-35 


2-37 


3-66 


4-64 


6-25 


7-82 


11-34 


4 


0-30 


0-71 


3-36 


4-88 


5-99 


7-78 


9-49 


13-28 


5 


0-55 


1-14 


4-35 


6-06 


7-29 


9-24 


11-07 


15-09 


6 


0-87 


1-64 


5-35 


7-23 


8-56 


10-64 


12-59 


16-81 


7 


1-24 


2-17 


6-35 


8-38 


9-80 


12-02 


14-07 


18-48 


8 


1-65 


2-73 


7-34 


9-52 


11-03 


13-36 


15-51 


20-09 


9 


2-09 


3-32 


8-34 


10-66 


12-24 


14-68 


16-92 


21-67 


10 


2-56 


3-94 


9-34 


11-78 


13-44 


15-99 


18-31 


23-21 


11 


3-05 


4-58 


10-34 


12-90 


14-63 


17-28 


19-68 


24-72 


12 


3-57 


5-23 


11-34 


14-01 


15-81 


18-55 


21-03 


26-22 


13 


4-11 


5-89 


12-34 


15-12 


16-98 


19-81 


22-36 


27-69 


14 


4-66 


6-57 


13-34 


16-22 


18-15 


21-06 


23-68 


29-14 


15 


6-23 


7-26 


14-34 


17-32 


19-31 


22-31 


25-00 


30-58 


16 


6-81 


7-96 


15-34 


18-42 


20-46 


23-54 


26-30 


32-00 


17 


6-41 


8-67 


16-34 


19-51 


21-62 


24-77 


27-59 


33-41 


18 


7-02 


9-39 


17-34 


20-60 


22-76 


25-99 


28-87 


34-80 


19 


7-63 


10-12 


18-34 


21-69 


23-90 


27-20 


30-14 


36-19 


20 


8-26 


10-85 


19-34 


22-78 


25-04 


28-41 


31-41 


37-57 


21 


8-90 


11-59 


20-34 


23-86 


26-17 


29-62 


32-67 


38-93 


22 


9-54 


12-34 


21-34 


24-94 


27-30 


30-81 


33-92 


40-29 


23 


10-20 


13-09 


22-34 


26-02 


28-43 


32-01 


35-17 


41-64 


24 


10-86 


13-85 


23-34 


27-10 


29-55 


33-20 


36-42 


42-98 


25 


11-52 


14-61 


24-34 


28-17 


30-68 


34-38 


37-65 


44-31 


26 


12-20 


15-38 


25-34 


29-25 


31-80 


35-56 


38-88 


45-64 


27 


12-88 


16-15 


26-34 


30-32 


32-91 


36-74 


40-11 


46-96 


28 


13-56 


16-93 


27-34 


31-39 


34-03 


37-92 


41-34 


48-28 


29 


14-26 


17-71 


28-34 


32-46 


35-14 


39-09 


42-56 


49-59 


30 


14-95 


18-49 


29-34 


33-53 


36-25 


40-26 


43-77 


50-89 



Reproduced by permission of the author, Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers. 

table then tells us the probability P that, in random sampling, a 
value of this statistic will occur greater than the value actually 
obtained. If this probability is very small, we regard the value 
obtained as significantly large, and conclude that the hypothesis is 
probably incorrect. 

Two conventional values of P are employed in deciding signi- 
ficance, viz. O05 and 0-01. These determine the 5 % and the 1 % 
levels of significance respectively. If the value of P obtained is 



172 Chi- Square [ix 

greater than 0-05 we infer that, in more than 5 % of random samples 
from the population in question, the value of x 2 obtained would be 
greater than that actually found, which is therefore regarded as not 
significantly large. If, however, P is less than 0-05 the value found 
is regarded as significant at that level. Similar remarks apply to the 
1 % level of significance. Values which are significant at the 1 % 
level of probability are said to be highly significant, and are some- 
times distinguished by a double asterisk. Significance at the 5 % 
level is then denoted by a single asterisk. 

The value of x 2 obtained from the sample may also be signi- 
ficantly small. A glance at the probability curve of x 2 w ^ help to 
make this point clear. When the number v of degrees of freedom is 




FIG. 8 



greater than 2, this curve has the form indicated in the diagram, 
except that when v = 3 it touches the y-axis at the origin, and when 
v = 4 it touches neither axis at that point. The ordinate for any 
value of x 2 is the probability density for that value; and this is small 
for small values of x 2 - When the value of P found from the table is 
greater than 0-95, the probability of obtaining a smaller value of x 2 
is less than 5 %, and the sample value must be regarded as signi- 
ficantly small. Similarly, when P is greater than 0-99 the probability 
of a smaller value of x 2 is less than 1 %, and the smallness of the 
sample value is highly significant. 

As an illustration of the x 2 test we may consider the following. 
Let x i be a random sample of n values from a normal population 
of variance cr 2 , x the sample mean and 
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Then we know that n$ 2 /cr 2 is distributed like x 2 with n~ I D.F. 
If then we are given the sample values x i9 but have no certain know- 
ledge of the population from which they were drawn, we may test 
the hypothesis that they came from a normal population of S.D. cr. 
For, when S 2 has been found from the sample, our hypothesis gives 
7i$ 2 /cr 2 as the sample value of x 2 '* and the table then enables us to 
decide whether this value is significant or not, that is to say whether 
our hypothesis is improbable or not. 

Example. A random sample of 12 values gave an unbiased estimate $ a of 
the population variance equal to 10-62 mm. 2 May the sample be reasonably 
regarded as from a normal population with variance 7 mm. 2 ? 

Here nS* = 11 x 10-62 = 116*82, and, according to our hypothesis con- 
cerning the population, 

X 2 = 116-82/7 = 16-7. 

The number of D.F. is 1 1. From the table we see that the probability, P, that 
the value of # 2 in such samples will exceed 16-7 is greater than 0-10. The 
value found is therefore not significant, and the test provides no evidence 
against the hypothesis of a normal population with variance 7. 

79. Test of goodness of fit 

We shall now consider the x 2 test f goodness of fit devised by 
Karl Pearson. As in Chapter vn let the population be one whose 
members may be separated into a number k of classes, and let p i 
be the relative frequency of the ith class. In the choice of a simple 
sample of n members from this population, the probability at any 
drawing that the individual selected will belong to the ith class is p$. 
The frequency f i of this class in sampling has a mean value r% 
given by 

m i = np t , (17) 

and the distribution of f i is the binomial. For large values of n the 
binomial distribution approximates to normal; and the sampling 
distribution of any class frequency is therefore approximately 
normal for large samples. 

Next suppose that we are given a set of class frequencies f it 
whose sum is n, without any information about the source of the 
values. We may wish to enquire whether they may reasonably be 
regarded as those of a simple sample from a certain hypothetical 
population. The population is hypothetical in the sense that it is 
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determined by an hypothesis, which must be such as to enable us to 
calculate the expected values m i of the class frequencies in random 
sampling from it. Wo shall prove that, for large samples, the variate 
S (/i~ m i) 2 /% conforms approximately to the %* distribution, and 

shall show how the number of degrees of freedom is determined. 
The value of x 2 obtained from the sample enables us to test the 
credibility of our hypothesis. 

Since, for large samples, the class frequency f t is distributed 
approximately normally about m i as mean, the variate 

^=/i-% (IB) 

is distributed approximately normally about zero as mean. We 
require the variance cr\ of f t when the class frequencies are enth'cly 
independent', and, to obtain this, we must take account* of the 
variation in their sum n. Thus 

*> - S/i, (19) 

and the size n of the sample varies about a mean n with S.D. cr n . 
Since the frequencies f t are supposed independent, the variance of 

n is given by 0^,0 

* 0i = 2>f. (20) 

i 

Now, in virtue of 47(8), the variance v\ for samples of varying 

size is ft 

*n. (21) 



Summing for all the classes we have, by (20), 



or, since ^ i p i = 1, 

< 
Hence or* = n, so that (21) is equivalent to 

<>1 = (Pi<li+Pi)n = Ptn = m t . (22) 

The quantity defined by 

^ = 2^M = i;(/,-m,)>, (23) 

is a sum of squares of standard approximately normal deviates, 
and is therefore distributed approximately like x 2 * 
* Cf. Fisher, 1922, 1, p. 88. 
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For samples of fixed size, the number of D.F. is clearly less than 
the number k of classes. For the sum of the class frequencies is 
constant, and this corresponds to a linear constraint on the variates 
x if Further, to determine the theoretical class frequencies m t -, it 
is sometimes necessary to estimate parameters of the population 
from the data of the sample. For instance, in testing the hypothesis 
of a normal population, it may be necessary to estimate the mean 
and the variance of the population from the sample values. Each 
estimate of a parameter obtained in this manner corresponds to the 
introduction of a linear constraint. For the moments about the 
mean are linear, or approximately linear, functions of the class 
frequencies; and, in equating such a function to a parameter, we 
introduce an approximately linear constraint on the variates x if 
In calculating the number of freedoms of # 2 , each constraint intro- 
duced in this manner must be recognized. 

The approximation of the binomial distribution to normal, when 
n is large, does not hold for very small values of p or q. Hence the 
above argument is not valid if one of the class frequencies is small; 
for that would make the corresponding relative frequency f/n very 
small, and therefore the probability p for that class in sampling 
from the population also very small. Classes of small frequency 
may be treated by combining two or more of them to form a class 
sufficiently large. 

80. Numerical examples 

Example 1. Can the wages of 1,000 employees, given in Ex. I, 3 (p. 17), be 
regarded as a random sample from a normal population? 

Our hypothesis is that the population is normal. Since the sample is large 
its mean and its variance are taken as estimates of those of the population. 
In Ex. Ill, 6 (p. 60) the class frequencies per thousand of the normal population 
are given. On account of the smallness of the extreme frequencies we combine 
the first two classes, and also the last two, leaving 13 classes. Since the sum 
of the class frequencies is constant, and two of the parameters were estimated 
from the sample, the number of degrees of freedom is 10. The value of ^ 
from the sample is 



From the table we see that, for 10 D.F., this value of x z is significant at the 
5 % level. We conclude that the assumption of a normal population is 
probably incorrect. 
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Example 2. From the adult male populations of seven large cities, random 
samples of the sizes indicated below were taken, and the numbers of married 
and single men recorded. Do the data indicate any significant variation 
among the cities in the tendency of men to marry ? 

City... A B C D E F G Total 

Married 133 164 155 106 153 123 146 980 

Single 36 57 40 37 55 33 36 294 

Total 169 221 195 143 208 156 182 1274 

We test the hypothesis that there is no significant variation hi the ten- 
dency mentioned. Then the men from each city may be regarded as a simple 
sample from a population in which the ratio of married men to single is 
approximately the same as in the column of totals. This ratio is 10 : 3. The 
theoretical frequencies for any city are then obtained by dividing the total 
for that city into two parts in this ratio. These frequencies are: 

City... A B C D E F G Total 

Married 130 170 150 110 160 120 140 980 

Single 39 51 45 33 48 36 42 294 

Total 169 221 195 143 208 156 182 1274 

From these figures we have 

32 G 2 52 6 2 



To find the number of freedoms we observe that the sum of the frequencies 
of married and single men from any city is constant, being equal to the size 
of the sample from that city. This reduces the number of independent fre- 
quencies to 7. And further, a parameter of the population was estimated from 
the sample, namely, the ratio of the numbers of married and single men. 
Consequently v = 6. For this number of freedoms the probability of obtaining 
a larger value of x 2 than 5-34 is about 0-50. The value is therefore not signi- 
ficant, and the test furnishes no evidence against the hypothesis. 

Example 3. May the data in Ex. Ill, 7 (p. 61) be regarded as those of a 
random sample from a Poissonian distribution ? 

The mean, m = 1*2, was estimated from the sample, and from it the 
theoretical frequencies per thousand of the Poissonian distribution were 
calculated in the example referred to. To apply the x 2 te st> we combine the 
last three classes. Then 

(3-8) 2 (3-6) 2 (4-4) a 

y2 = _ _ L_L_ _ ___L _L . _ _ = 3.5 

A 301-2 361-4 7-6 

Here the number of classes is 6; but the total frequency is constant, and m 
was estimated from the sample, so that v = 4. With 4 D.P. the probability 
that x 2 w iH exceed 3-5 is nearly 0*50. The value is therefore not at all signi- 
ficant, and the assumption of a Poissonian population is not discredited. 
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81. Additive property of chi- square 

THEOREM I. // the independent varieties x and y conform to the x 2 
distribution, with v l and *> 2 D.F. respectively, then x + y is distributed 
like x* with v l -f v 2 D.F. 

For \x and \y are independent Gamma variates with parameters 
\v and \v% respectively. Therefore, by 68, Theorem II, \(x + y) is 
a Gamma variate with parameter \(v\ + v^. Consequently x + y 
conforms to the x 2 distribution with v r + v% D ' F - 

Similarly, corresponding to 68, Theorem III, we may state 

THEOREM II. // the sum of two independent positive variates is 
distributed like x 2 with v l + 1> 2 D .F., and one of them is distributed like 
X* with v l D.F., then the other is distributed like x 2 with v z D.F. 

In the same way Theorem V of 69 and Theorem VIII of 73 
may be expressed as 

THEOREM III. // the independent variates x and y are distributed 
like x 2 with v and v< D.F. respectively, then x/(x + y) is a Beta variate 
of the first kind with parameters ^v l and \v^ while xjy is a Beta variate 
of the second kind with the same parameters. 

82. Samples from an uncorrelated bl variate normal population. 
Distribution of the correlation coefficient 

The distribution of the coefficient of correlation, r, in samples 
from a normally correlated bivariate population was given by 
Fisher* in 1915. In the particular case of an uncorrelated popula- 
tion (p = 0) the distribution of r is very simple. Let a l and <T 2 be 
the S.D.'S of x and y in the population, and let the variates be mea- 
sured from their means. Consider a random sample of n pairs of 
values x tJ y t , from such a population. The variances S\, S\ of x 9 y 
in the sample are given by 



t t 

and the correlation, r , in the sample is 






* Fisher, 1915, 2. A simple and excellent proof of a different character 
has recently been published by Sawkins (1944, 1). The proof for/) = given 
in this section is based on that of Sawkins. 
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Owing to the nature of the sampling the n values y t are indepen- 
dent. Let these be subjected to an orthogonal transformation 
yielding n variates %, the first of which may be taken as 



since the sum of squares of the coefficients of the y t is unity. Then 



so that nS\ = S ijf, (i) 

2 



and this sum, divided by erf, is distributed like x* w ^h n-l D.F. 
Further, from the above definition of r, 



We may take this sum for the second variate, 7/ 2 > f r ^ e sum f 
the squares of the coefficients of the T/< is unity, the orthogonal 
condition is satisfied by the coefficients of ij l and ?) 2 , and the 
variables x and y are independent. Since r/| = nr 2 /S| it follows from 

n 

(i) that 2% 2 = n (l r*)S\. Further, the ^/tr 2 are independent 

3 

standard normal variates. Thus nr*S\lcr\ and n(l r 2 ) S\jor\ are 
distributed independently like x 2 with 1 and n 2 D.P. re- 
spectively; or we may express it by saying that the former is a 
Xi and the latter a ^_ 2 - ^ follows that 



n/SJ nr5S/o| + n( 1 - 

and therefore, by 81, Theorem III, r 2 is a /^(l, |(n 2)) variate, 
with distribution 



We may thus state the important 

THEOREM IV. For random samples of n pairs of values from an 
uncorrelated bivariate normal population, r 2 is a Beta variate of the 
first kind with parameters | and %(n - 2). 

The expected value of r 2 for such samples is therefore I/(n 1), 
and the S.E. of r is l/^/(n 1). And since r ranges from 1 to 1, the 
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opposite values r giving the same value of r 2 , the distribution 
of r is 

n _*.2U(n-4)^ r 

^ ^m ( 25 ) 



In considering the linear regression of y on x we proved the 
relation (cf. 27(31)) 

S(2/*-.v) 2 = D(2/*-^) 2 +2(^-2/) 2 > ( 2 ) 

where Y is the estimate of y from the regression equation. In the 
present notation this corresponds to the identity 

We have just seen that, for samples from the above population, 
these three sums are independent. Thus the sum of squares of 
deviations from the line of regression is distributed independently 
of the sum of squares of deviations due to regression in the sample. 
When divided by a\ the three sums in (27) are distributed like x 2 
with r& 1, n 2 and 1 D.F. respectively, illustrating the additive 
property of x 2 stated in 81 . 

83. Distribution of regression coefficients and correlation ratios 

The distribution of the linear regression coefficient, b } ofyonx 
follows from the above results. For 

b = rS 2 /S 1 

.. , &v\ t 

so that ~ = - 



As we have just seen, the numerator and denominator of the second 
member are distributed independently like x 2 with 1 and n- 1 D.F. 
respectively. Consequently, by Theorem III, the quotient is a 
&(i \( n 1)) variate, and therefore b 2 <r\/crl is a variate of the same 
type. Hence we may state 

THEOREM V. In random samples of n pairs of values from an 
uncorrelated bivariate normal population, in which the standard 
deviations of the variables are ar and cr a , the linear regression coefficient 
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bqfyonx is such that 6 2 crf/cr| is a Beta, variate of the second kind with 
parameters % and \(n - 1). 

Since the mean value of a /? 2 (J, ra) variate is l/(m 1 ), the expected 
value of 6 2 in sampling is o"l/(n- 3) erf. And, since the range of b is 
from oo to -f oo, the distribution of b is 



^ ' 



The distribution of the correlation ratio, ?/, of y on # may be 
determined from the corresponding resolution of the sum of squares. 
With the notation of 34 let the subscript i distinguish the particular 
array of i/'s, while j indicates position in that array, y i denoting the 
mean of the y's in the tth vertical array, and n t the frequency in 
that array. Then, in virtue of 34, we have the resolution 

(y< - ) 2 , (29) 

the various sums being equal to the corresponding terms of the 
identity 

(30) 



Now the first member of (29), divided by cr|, is distributed like x 2 
with n 1 D.F. Further, since 2 denotes summation over the values 



of any particular array, (t/# yifj^\ is distributed like x* with 

w i 1 D.F. Summing for all the h arrays we see that the first sum in 
the second member of (29), divided by erf, is a # 2 with (n { - 1) or 

i 

n h D.F. And, because the means of the arrays are distributed 
independently of their variances, it follows from 81, Theorem II, 
that the last sum in (29), divided by cr|, is a x 2 with h 1 D.F.* We 
may therefore write 



* See also Ex. 7 at the end of this chapter. 
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in which the subscript indicates the number of D.F. for the # 2 . 
Thus, in virtue of 81, Theorem III, f is a &((*- 1), i(n-&)) 
variate, and we may state 

THEOREM VI. For random samples of n pairs of values from an 
uncorrelated bivariate normal population, the square of the correlation 
ratio of y on x is a Beta variate of the first kind with parameters 
\(h 1 ) and \(n A), where h is the number of arrays of y's. 

The distribution of if is thus 

n-ft-S 

' ( } 



and its mean value,* by 69 (23), is 
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EXAMPLES IX 

1. Apply the ^ 2 test to show that the deaths of centenarians 
recorded in the example of 18 may reasonably be regarded as a 
random sample from a Poissonian population. 

2. A certain hypothesis was tested by two similar experiments, 
which gave # 2 = 14-7 for v = 9 and x 2 = 14-9 for v = 11. Show that 
the two experiments combined give less reason for confidence in the 
hypothesis than either experiment alone. 

* Cf. Fisher, 1922, 2, pp, 604-5. 
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3. Prove by mathematical induction that the sum of the squares 
of n independent standard normal variates conforms to the x^ 
distribution (2) of 74. (See Fisher, 1935, 1, pp. 353-4.) 

4. Geometrical proof of the x 2 distribution. The following geo- 
metrical proof, that the sum of the squares of n independent stan- 
dard normal variates x i has a distribution given by (2) of 74, is 
due to Fisher (1935, 1, p. 354). As in 74 the probability that the 
values of the variates will fall simultaneously in the respective 
intervals dx i is 

dp = (27r)-* n exp ( - %x 2 ) dx^x 2 . . . dx n , (i) 

where x 2 = S #? 

i 

Let us regard the X L as coordinates of the current point P in Euclidean 
space of n dimensions. Then dp is the probability that P will fall 
in the element of volume dx^dx^...dx n . The coefficient of this 
element of volume in (i) is therefore the probability density for this 
space, and it is proportional to exp ( 1% 2 ). Now we can express, in 
terms of x an d dx, an element of volume in which the value of x 
may be regarded as constant. For, since x 2 is the square of the 
distance of P from the origin, x is constant over the surface of a 
hypersphere with radius x an( i centre at the origin. The volume 
enclosed by this hypersphere is proportional to x n \ an d the element 
of volume between this and the adjacent hypersphere of radius 
X + dx is proportional to d(x n ), that is to X n ~ 1 dX- Using the above 
value of the probability density, we see that the probability that 
P will fall in the region bounded by the two hyperspheres is pro- 
portional to x n ~ l exp ( ^x 2 ) dx', and this is the probability that the 
value of x from the random sample will fall in the interval dx> The 
above probability is clearly proportional to 



and, since x 2 roust lie between and +00, the constant factor is 
l/F(%n), so that the integral of the probability throughout this 
range may be unity. Since x 2 ti es in the interval dx 2 when x lies in 
the interval dx, it follows that the distribution of x 2 is given by 
74(2). 
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For still another proof of the x 2 distribution see Plummer, 1940, 
1, p. 246. 

5. Homogeneity of several estimates of the population variance. 
Suppose that k independent samples have furnished estimates s\ of 
the population variance, based on v i (i = !,...,&), D.F. Are these 
estimates such that the samples may be regarded as drawn from 
the same population? In other words, are the estimates homo- 
geneous? 

On the hypothesis that the samples are from the same population, 
an unbiased estimate s 2 of the variance of the population is 



since the expected value of this quantity is the variance of the 
population, in virtue of 54(8). Bartlett* has shown that the 
statistic 



is distributed approximately as x 2 with fc 1 D.F. The value of x 2 
calculated from the data will tell whether the hypothesis of homo- 
geneity is reasonable. 

6. Show that the estimates 3-8, 4-4, 8-1, 6-1, and 9-4 of the 
population variance, based on 5, 8, 6, 7 and 4 D.F. respectively, may 
be regarded as homogeneous according to the test of Ex. 5 (k = 5, 
X 2 * 1'45). 

7. That the sum S^^-*/) 2 /^! of 83 is distributed like x 2 

i 

with h 1 D.F. may also be shown as follows. Since y is the weighted 
mean of the y i we have 

2X(y<-X) 2 = S"<(sr<-y) a + n(y-/0 9 . (ii) 

Now yifji' is distributed normally about zero as mean with 
variance crf/r^. Consequently n i (y i iJi f ) 2 lcr\ is a x 2 with 1 D.F.; and, 

* Proc. Roy. Soc. A, vol. 160, 1937, pp. 268-82. See also Neyman and 
Pearson, 1931, 3. 
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on summing for all the arrays, we see that the first member of (ii) 
divided by cr\ is a x 2 with h D.F. Similarly, y /i f is distributed 
normally about zero as mean with variance cr\ln\ and the last term 
in (ii), divided by erf, is therefore a x 2 with 1 D.F. Theorem II of 81 
then shows that the first sum on the right of (ii), divided by 0$ , is 
a x 2 with A 1 D.F., as stated. 

8. Taking the correlation ratio as positive, deduce from 83 (31) 
that the mean value of the correlation ratio of y on x, for samples 
from an uncorrelated bivariate normal population, is 



9. Integrating by parts show that, if r is an even positive integer, 



and, by continuing the process, show that the value of the integral is 



Hence show that the probability, in random sampling, that the 
value of x 2 w ^h v (even) D.F. will exceed x 2 * is obtained from this 
expression by putting /? = &XQ an d T = K^""^). (Cf. Fisher, 1935, 
1, pp. 356-7.) 

10. The variables x, y are normally correlated with coefficient p. 
Show that u and v, defined by 

u = a/Oi + y/cr 2y v = x/o-i - jr/(r 2 , 

are independent normal variates with variances 2(1 -fp) and 2(1 -p) 
respectively; and that, if R is the correlation between u and v in 
random samples of n pairs from the bivariate population, R 2 is a 
Ai(i? i 71 "" 1) variate. 



CHAPTER X 

FURTHER TESTS OF SIGNIFICANCE. 
SMALL SAMPLES 

84. Small samples 

The use of the standard errors of statistics in the tests of Chapters 
vi and vii depends upon the fact that, in the case of large samples, 
the sampling distributions of many statistics are approximately 
normal, or at any rate unimodal with the property that a value of 
the statistic, deviating from its mean by more than two or three 
times its S.B., is very unlikely. For small samples, however, the 
distributions of statistics are often far from normal. Moreover, 
the estimate of a parameter of the population made from a small 
sample is not at all reliable. For these reasons the use of standard 
errors in connection with such samples is very limited. The chief 
concern of the theory of small samples is with the distributions of 
various statistics, and the applications of tests of significance based 
upon these distributions. In each application we test a hypothesis 
concerning the source of the sample. This hypothesis may be, for 
instance, that a certain parameter of the population has a specified 
value, or that two given samples were drawn from the same popula- 
tion. In each instance the test employed enables us to form a con- 
clusion based on considerations of probability. The nature of the 
tests is illustrated by the %* test of 78, which is applicable to 
samples of any size. But, as already indicated, the test of goodness 
of fit considered in 79 can be applied only to large samples. 

We remind the reader of two important results obtained earlier. 
First, in simple sampling from a population with mean fi and 
variance er 2 , the distribution of the mean x of the sample of n 
members has / for its mean and <r 2 /n for its variance (see 50). 
This result holds whether the sample is large or small, and whether 
the population is normal or not. In the case of a normal population 
the distribution of x is also normal (see 22 or 82). Secondly, the 
statistic s 2 defined by n 

(rt-l)s 2 = (*<-*) (1) 
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is an unbiased estimate of the population variance o* 2 , in the sense 
that E(s 2 ) = cr 2 (cf. 54(8)). This estimate of cr 2 is said to be based 
on tt-1 D.F., since the variates x i x are not independent, being 
connected by the linear relation 2 ( x i ) = 0. The number of 
independent variates is thus only n 1 ; and this agrees with the 
result of 77, that (n- l)$ 2 /cr 2 is distributed like x 2 with n 1 D.F. 

The following discussion applies to samples of any size. It is 
assumed, however, that the populations from which the samples 
are drawn are normal. The results obtained are therefore strictly 
true only in this case. But they are approximately true, and may be 
usefully applied, in most cases in which the departure of the popula- 
tion from normality is not very marked. 

'STUDENT'S* DISTRIBUTION 

85. The statistic t and its distribution 

We have seen that, in simple sampling from a normal population 
of mean fi and variance <r 2 , the deviation x /i is a normal variate, 
with mean zero and S.D. crf^n. The quotient of x fi by cr/^n is 
therefore distributed normally with unit S.D. If, however, in place 
of the constant a we use the variable estimate s obtained from the 
sample, we have the statistic 

- (2) 



which is not normally distributed. The distribution of t was first 
found by W. S. Cosset, who wrote under the nom de plume of 
'Student'. It follows very simply from the theorems of 67, 73 
and 77. For, from its definition, 



where nS z = (^ - #) 2 and v is the number of D.F., n 1, on which 
the estimate s 2 is based. Hence 
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But, in virtue of 67 (Theorem I) and 77, the numerator in the 
second member is a Gamma variate with parameter |, and the 
denominator a Gamma variate with parameter \v\ and these are 
distributed independently of each other (cf. 82 or Ex. VIII, 11). 
Consequently t 2 /v is a /? 2 (i>l* ; ) variate; and its distribution is 
obtained by substituting &\v for x in 72(27), with l % and 
m = |y. Thus the probability that, in random sampling, the value 
of 2 will fall in the interval dt 2 is 



By integrating with respect to t 2 from a fixed value t\ to infinity,* 
we obtain the probability that the value of t 2 will exceed t%. The 
accompanying table contains extracts from a more complete one 
given by Fisher. In the body of the table are given the values of t 
corresponding to certain fixed values of v and P. Thus, for a specified 
number v of D.F., the value of P at the top of the column is the 
probability that, in random sampling, the numerical value of t in 
the body of the table will be exceeded. 

The range of t 2 is from to oo, and its distribution is given by (3). 
The statistic t, however, ranges from oo to +00, and its distribution 
is therefore 

dt 



the factor 2 disappearing, since the integral of dP over the whole 
range of variation of t must be unity. The distribution (4) is spoken 
of as the t distribution corresponding to v D.F. And from the above 
argument it is clear that any statistic t , whose range is from oo to 
+ 00, and which is such that &\v is a /? 2 (J, \v) variate, conforms to 
the t distribution for v D.F. This important result may also be 
expressed in the form of 

THEOREM I. A statistic t conforms to the t distribution for v D.F. if its 
range is from -co to 4-00, and t 2 /v is expressible as the quotient of two 
independent variates, which are distributed like x 2 with 1 and v D.F. 
respectively. 

* For a method of evaluating this integral see Fisher, 1935, 1, pp. 358-60. 
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Table of t 



TABLE 4. Valuer of mod. t with a probability P 
of being exceeded in random sampling 

v sa number of degrees of freedom 





0-50 


0-10 


0-05 


0-02 


0-01 


1 


1-000 


6-34 


12-71 


31-82 


63-66 


2 


0-816 


2-92 


4-30 


6-96 


9-92 


3 


0-7(55 


2-35 


3-18 


4-54 


5-84 


4 


0-741 


2-13 


2-78 


3-75 


4-60 


5 


0-727 


2-02 


2-57 


3-36 


4-03 


6 


0-718 


94 


2-45 


3-14 


3-71 


7 


0-711 


90 


2-36 


3-00 


3-50 


8 


0-706 


86 


2-31 


2-90 


3-36 


9 


0-703 


83 


2-26 


2-82 


3-25 


10 


0-700 


81 


2-23 


2-76 


3-17 


11 


0-697 


80 


2-20 


2-72 


3-11 


12 


0-695 


78 


2-18 


2-68 


3-06 


13 


0-694 


77 


2-16 


2-65 


3-01 


14 


0-692 


76 


2-14 


2-62 


2-98 


15 


0-691 


75 


2-13 


2-60 


2-95 


16 


0-690 


75 


2-12 


2-58 


2-92 


17 


0-689 


74 


2-11 


2-57 


2-90 


18 


0-688 


73 


2-10 


2-55 


2-88 


19 


0-688 


73 


2-09 


2-54 


2-86 


20 


0-687 


72 


2-09 


2-53 


2-84 


21 


0-686 


72 


2-08 


2-52 


2-83 


22 


0-686 


72 


2-07 


2-51 


2-82 


23 


0-685 


71 


2-07 


2-50 


2-81 


24 


0-685 


71 


2-06 


2-49 


2-80 


25 


0-684 


71 


2-06 


2-48 


2-79 


26 


0-684 


1-71 


2-06 


2-48 


2-78 


27 


0-684 


1-70 


2-05 


2-47 


2-77 


28 


0-683 


1-70 


2-05 


2-47 


2-76 


29 


0-683 


1-70 


2-04 


2-46 


2-76 


30 


0-683 


1-70 


2-04 


2-46 


2-75 


35 


0-682 


1-69 


2-03 


2-44 


2-72 


40 


0-681 


1-68 


2-02 


2-42 


2-71 


45 


0-680 


1-68 


2-02 


2-41 


2-69 


50 


0-679 


1-68 


2-01 


2-40 


2-68 


60 


0-678 


1-67 


2-00 


2-39 


2-66 


00 


0-674 


1-64 


1-96 


2-33 


2-58 



Reproduced by permission of the author, Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers. 

The reader should sketch the probability curves for ft and t. 
The former is asymptotic to both axes, with ordinate which decreases 
continuously as t 2 increases. The probability curve of t is symmetrical 
about the line t = 0. It is asymptotic to the -axis at each end, and 
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the maximum ordinate is that for t = 0. Thus small values of t 2 are 
more likely than larger values. In this respect t 2 differs from x 2 
when the latter has more than 2 D.F. 

86. Test for an assumed population mean 

The statistic t and the above table provide a means of testing an 
assumed value [i for the mean of the normal population from which 
the random sample was drawn. On the hypothesis that fi is the 
true value, the equation (2) and the values of x and s derived from 
the sample enable us to calculate t. The table then gives the prob- 
ability P that this value will be exceeded numerically in random 
sampling from a normal population with mean fi. If P is less than 
0*05 we regard our value of t as significant. If P is less than 0*01 
we regard it as highly significant. A significant value of t throws 
doubt on the truth of the hypothesis that p is the mean of the 
population. 

Example. A random sample of nine from the men of a largo city gave a 
mean height of 68 in. ; and the unbiased estimate s 2 of the population variance 
found from the sample was 4*5 in. 3 Are these data consistent with the 
assumption of a mean height of 68*5 in. for the men of the city? 

Large populations of heights of men are known to be approximately 
normal. For the given sample 

= 68-0, p = 9-l = 8, s = V 4 * 5 = 2 ' 12 ' 
and therefore, on the hypothesis that JJL = 68-5, we have 

|*| = | (*-/*) | Jn/* = (0-6) x 3/2-12 = 0-707. 

I?rom the table we find that, for v = 8, the probability that this value of t 
will be exceeded numerically in random sampling is about 0-60. The value 
is therefore not at all significant, and the test provides no evidence against 
the assumption of a population mean of 68-5 in. 

If we make the assumption that /* = 69-5 or 66-5 in. we obtain a value of 
1 1 1 three times as great as before, viz. 2-12. The probability that, for v = 8, 
this value will be exceeded in random sampling is greater than 0-05, and the 
new value of t is still not significant. There are thus fairly wide limits for the 
assumed population mean which, with the data of the sample, will provide 
a value of t which is not significant. These limits we proceed to consider. 

87. Fiducial limits* for the population mean 

Suppose that a certain sample from a normal population has a 
mean x, and provides an unbiased estimate s 2 of the population 

* Cf. Fisher, 1930, 3 and 1933, 1. 
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variance based on v D.F. We wish to find limits to the assumed 
population mean {i so that, with the data of the sample, it will lead 
to a value of t that is not significant. If our choice is the 5 % level 
of significance, we define the 95 % confidence range for fi as that 
range of values of [i which, with the data of the sample, will furnish 
a value of 1 1 \ less than the value ^ which corresponds to P 0-05. 

This requires 

\x-p\Jn/s<t l9 

so that x stj^n <fi<x + stj^n. (5) 



Consequently fi must lie within the range extending 
to x + stj^jn, which is called the 95 % confidence range for [i corre- 
sponding to the given sample; and the bounding values of this range 
are the corresponding confidence limits, or fiducial limits, for the 
mean of the population. Similarly we have confidence ranges and 
limits corresponding to other levels of significance. In each case the 
appropriate value of t is found from the table of t. 

Example 1 . Find the 95 % fiducial limits for fi corresponding to the sample 
in the example of 86. 

Here v - 8, ^ = 2-31, s = 2-12, n = 9. Hence 

stjjn = (2-12) (2-31)/3 = 1-63. 

The required limits are therefore 68 1-63, that is 66-37 and 69-63 in. 

Example 2. Find the 98 % fiducial limits for JJL corresponding to the same 
sample. 

From the table we find that t t = 2-90 is the value of t which is exceeded 
numerically with a probability of 2 %. Then 

**!/> = (2-12)(2-90)/3 = 2-05, 
and the required limits are 68 2-05, that is 66-95 and 70-05 in. 

88. Comparison of the means of two samples 

Given two independent samples of n^ and n% members, with means 
x and x 2 respectively, we may use the t distribution to decide 
whether the means differ significantly, or whether the two samples 
may be regarded as drawn from the same normal population.* We 
test the hypothesis that they are from the same normal population. 

* Cf. Fisher, 1925, 1, pp. 90-3. 
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Let x i (i = 1 , . . . , %) be the values of the variable in the first sample, 
and x'j (j = 1, ...,n 2 ) those in the second. Then the sums of squares 
for the two samples are 

i Sf = S (*< - *i) 2 , -8? = S (*} - * 2 ) 2 

i j 

respectively. Also, if cr 2 is the variance of the population, ^/Sf/cr 2 
and n 2 $2/cr 2 are distributed like ^ 2 with n l 1 and n 2 l D.F. 
respectively. Consequently (%/Sf + n 2 /8|)/cr 2 is a ;\; 2 with *> D.F., 
where 

v = %-|-n a 2. 

An unbiased estimate of the population variance, obtained from the 
samples, is 



since 

E(V8*) = E(niS\ + ni8 = (^-1)0* + (n 2 ~ l)cr 2 = w 2 , 

so that i?(5 2 ) = cr 2 . 

Now, in virtue of the hypothesis, Xj and # 2 are normally dis- 
tributed about the population mean with variances cr 2 /^ and 
<T 2 /n 2 respectively. Therefore, since the samples are independent, the 
difference x^ x^ is normally distributed about zero with variance 
/r& 2 ). If then we define a statistic t by the equation 



/ =: 



The numerator and denominator of the second member are dis- 
tributed independently like x 2 with 1 and v D.F. respectively. Hence, 
by the Theorem of 85, the statistic t conforms to the t distribution 
for v D.F. It is the quotient of the normal deviate x^ x^ by the 
estimate of its S.B. derived from the samples. If the value of t 
obtained from the samples is significant, our hypothesis is dis- 
credited. 

We might vary the above argument to test the hypothesis that 
the samples were drawn from different normal populations, with 
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means fi and ju,' respectively, but the same variance. In this case 
#! -fi and 2 ~X are normally distributed about zero with variances 
cr 2 /^ and 0" 2 /n 2 respectively. Hence their difference 



is normally distributed about zero with variance o <2 (l/n 1 -f l/w 2 ). 
This difference takes the place of x l x 2 in the above argument, so 
that 



J = ^ 

conforms to the t distribution for v D.F. 

Example 1. Show that the 95 % fiducial limits for the difference of the 
means of the populations are x l x z ^3^(1 /^ + l/n a ), where ^ is the value 
oft corresponding to P = 0-05 and v D.F. If zero lies outside the above range 
of /t /*', we conclude that the difference between the means of the samples is 
significant of a difference between the means of the populations. 

Example 2. The heights of the ten men of a random sample from an 
unknown population gave a mean of 69 in., and a sum of squares of deviations 
from the mean equal to 42 in. 2 Apply the t test to the hypothesis that this 
sample is from the same population as that of the example in 86. 

From the data we have 



The estimate s z of the population variance is 

a = (36-{-42)/17 = 4-69, 
so that s = 2-14. The value of t from the sample is then 

1 790 

t = / - = 1-01, 

2-14V 19 

For v = 17 this value is not at all significant. The test therefore provides no 
evidence against the hypothesis. 

89. Significance of an observed correlation 

The distribution and table of t may also be used to test the 
significance of a value r of the correlation coefficient, given by a 
random sample of n pairs of values from a bivariate normal popula- 
tion. The S.B. of r may not be used in the case of small samples, 
since the distribution of r is then far from normal. We have seen 
that, if the variables in the normal population are uncorrelated, 
the value of r 2 in random samples of n pairs is a /?i(i,i(n 2)) 
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variate. Therefore, by Theorem VII of 72, r 2 /(l r 2 ) is a 
A(lb %( n 2)) variate. Thus, if t is defined by 



(8) 



then t z /(n 2) is a ^ 2 (^ 9 ^(n 2)) variate, so that t conforms to the 
t distribution for n 2 D.F. 

To test the significance of the sample value of r, we make the 
assumption that the variables in the population are uncorrelated. 
A simple calculation then gives the value of t corresponding to the 
sample; and the table of t then tells whether the value obtained is a 
rare one. If it is, our assumption is discredited, and we conclude 
that the variables in the population are probably correlated. 

Example 1. Is a correlation coefficient of 0-5 significant, if obtained 
from a random sample of 11 pairs of values from a normal population? 

Here r = , i> = 9, = Jx3~ V( 3 / 4 ) = V 3 = 1>73 - Fr m tn e table we find 
that the probability of obtaining a value of t larger than this is greater than 
0-10. Hence there is no reason to suspect the hypothesis of uncorrelated 
variables in the population. The value 0-5 is not significant. 

Example 2. Find the least value of r, in a sample of 27 pairs from a normal 
population, that is significant at the 5 % level. 

Here v = 25 and, at the 5 % level of significance, t = 2-06. Hence for r to 
be significant we must have 



which requires r 2 > 0-145 and therefore |r|>0*38. Values of r numerically 
less than 0-38 are not significant at the 5 % level. 

The accompanying short table gives the least values of | r \ that 



TABLE 5. Minimum values of r that are significant at the 5 % level 



V 


r 


V 


r 


V 


r 


V 


r 


4 


0-811 


11 


0-553 


18 


0-444 


45 


0-288 


5 


0-755 


12 


0-532 


19 


0-433 


50 


0-273 


6 


0-707 


13 


0-514 


20 


0-423 


60 


0-250 


7 


0-606 


14 


0-497 


25 


0-381 


70 


0-232 


8 


0-632 


15 


0-482 


30 


0-349 


80 


0-217 


9 


0-602 


16 


0-468 


35 


0-325 


90 


0-205 


10 


0-576 


17 


0-456 


40 


0-304 


100 


0-195 



Reproduced by permission of the author, Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers. 
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are significant at the 5 % level, for samples of different sizes from a 
normal population. These values may be calculated as in Ex, 2 
above. The number v of degrees of freedom is n - 2, where n is the 
number of pairs of values in the sample. 

90. Significance of an observed regression coefficient 

The significance of an observed value, 6, of the linear regression 
coefficient of y on x, in a random sample from a normal population, 
may also be tested by the t distribution and table. It was shown in 
83 (Theorem V) that, for random samples of n pairs from an un- 
correlated normal population, 6 2 (rf/cr| is a /? 2 (l>!( n ~~ *)) variate. 
Consequently the statistic t defined by 

t-bo-^n-l)/^, (9) 

conforms to the t distribution for n1 D.F. But, since o' l and cr 2 
are usually unknown, this relation is not of much use in providing 
a test of significance for 6. However, a similar result is free from this 
objection. For, from the relation 

6 = rSz/Si, (10) 

in which $f and S| are the variances of x and y in the sample, it 
follows that 



where Y i as usual denotes the estimate of y i found from the regression 
equation. Now, by 82, the numerator and denominator of the 
second member are distributed like x 2 with 1 and n 2 D.F. respec- 
tively. The quotient is therefore a /#2(i>i( w "~~2)) variate, and the 
statistic t defined by 

t = b V{(n - 2) S fa - xYl 2 (y. - 7<)*} f (11) 

conforms to the t distribution for n 2 D.F., and may be used to 
test the significance of the value of 6 found from the sample. More 
generally Fisher* has shown that, if the variables in the population 
are correlated, with ft as coefficient of regression of y on x, the 
statistic which conforms to the t distribution is obtained from (11) 
on replacing b by (b 0). 

* Fisher, 1922, 2, p. 609. 
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91. Distribution of the range of a sample. 

The range, w, of a sample of values is the difference between 
the highest and lowest values. This concept is widely used in 
the statistical control of quality in mass production by an in- 
dustrial plant. Though the range does not conform to the ^-distri- 
bution, it is convenient here to consider the sampling distribution 
of w for samples of n values from a continuous population, whose 
relative frequency density is f(x) in the interval (a, 6). Consider 
the infinitesimal intervals (u, u + du) and (v, v + dv) within the 
range of variation of x, and such that u < v. Then the probability 
that, at any drawing, the value chosen will lie in the first of these 
is f(u) du, and in the second f(v) dv. The probability that it will 
lie in the intervening interval is 



u+du 

Consequently the probability that, in the drawing of n values, 
one will lie in the interval du, one in the interval dv, and the 
remaining n 2 in the intervening interval is 

dP = n(n-l)f(u)f(v) p n ~* dudv, (ii) 

since n(n 1) is the number of ways in which one of the n values 
may fall in each of the intervals du and dv. In other words, dP 
is the probability that the lowest value in the sample will fall in 
the interval du, and highest in the interval dv. 

We may express this in terms of u and w instead of u and v. 
Since w = v u the Jacobian of u, w with respect to u, v is unity, 
and the probability that the lowest value will fall in the interval 
du, and w in the interval dw, is therefore 

au+io \n-2 

f(x)dx\ dudw (Hi) 

to within infinitesimals of this order. Summing for all intervals 
du consistent with a range w, we find the probability that w will 
fall in the interval dw, irrespective of the value of u, as 

r rb-w / fw+ w \n-2 "I 

dp = n(n- 1) I f(u)f(u + w) M f(x) dxj du\dw. (12) 
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This is the required probability differential of w. Denoting the 
coefficient of dw by <j>(w) we have the expected value of w as 

a 

w<j)(w)dw. 

This expected value has been calculated by numerical integration 
for the case of a normal population of S.D. cr, and for various 
values of n. Among the- results found are 

n: 2 3 4 5 6 10 50 

E(w)lar: M3 1-69 2-06 2-33 2-53 3-08 4-50 

Example 1. If the variable in the population has a uniform distribu- 
tion from to 6, then / (x) = 1/6. Show that the probability differential 
of the range is n(n 1) b~ n w n ~ 2 (b w) dw, and that the expected value of 
w is (n-1) 6/(n-hl). 

Example 2. Show that, in taking a random sample of n numbers 
between zero and unity, the probability that the range will exceed 0-5 
is 1 (n-f l)/2 n . Show also that 8 is the least value of n for which this 
probability exceeds 0-95. 



DISTRIBUTION OF THE VARIANCE RATIO 

92. Ratio of independent estimates of the population variance 

As in 88, let us consider two independent random samples whose 
values are x i (i = l, ..M^), and x^ (j = l, ...,^ 2 ), with means x v 
and #2 respectively. These provide estimates s\ and s\ of the variances 
of the populations, given by 



corresponding to ^ and v 2 D.F. respectively, where 



We wish to consider whether two such estimates are significantly 
different, or whether the samples may be regarded as drawn from 
the same normal population of variance or 2 , 
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An appropriate test is furnished by the sampling distribution of 
the ratio F of two such estimates of <r 2 obtained from independent 
samples from the same normal population. Thus if 



(13) 



then 



Now the numerator and denominator of the second member are 
distributed independently like x z with v t and v 2 D.F. respectively. 
Hence the quotient is a fi z (k v i>$ v z) variate, so that v^Fjv^ con- 
forms to the distribution of 72, with I = ^v l and m \v^. Con- 
sequently the probability that the value of F will fall in the interval 
dF is 



dP = 



(14) 



This is the required distribution of the variance ratio for v l and v 2 D.F. 
It will be observed that the distribution is independent of the 
variance cr 2 of the population. 




Fia. 9 



Since v^Fjv^ is a /? 2 (i*' 1 , -Jv 2 ) variate its modal value, by 72 (28), 
is (\v 1 )/(^ 2 + 1 ) Consequently the modal value of F is given by 
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and this is always less ilian unity. Similarly, the mean value of 
ViFjVfr by 72(29), is i^i/d^^ 1 ); an ^ therefore the expected 
value of F is 

E(F)=-^, 

which is independent of v v and is always greater than unity. The 
probability curve for F depends, of course, on both v l and v%\ but 
its main features, for v > 4, are those of the curve in Fig. 9. 

93. Fisher's z distribution. Table of F 

A distribution equivalent to ( 1 4) was first obtained by R. A. Fisher. 
Writing z = ilog c F and therefore F = e 2 * in the above result, we 
deduce immediately that 

2v\ v ii>2 V *e v i z dz 

*** = B/l ,. 1 ,. \ /,. ^2z ,. \Afv.-4-iO" \*-b) 



This is Fisher's z distribution. The probability that a specified value 
of z will be exceeded in random sampling depends upon v l and v 2 . 
Fisher published tables* giving the values of z that will be exceeded 
with probabilities 0-05 and 0-01 respectively, corresponding to 
specified values of p l and v 2 . From these G. W. Snedecor prepared 
a tablef f r the variance ratio, which he denoted by F in honour of 
Fisher. Extracts from this table are here printed by permission of 
Snedecor and the Iowa Press. The ratio, F, tabulated is that of the 
larger estimates of variance to the smaller. The number v of degrees 
of freedom corresponding to the larger estimate determines the 
column in the table, while v 2 determines the row. At the inter- 
section of the row and the column are given two values of F. The 
upper is the value that will be exceeded with a probability 0-05, 
and the lower with a probability 0-01. These are often referred to 
as the 5 and 1 % * points' of F. The latter is, of course, always the 
larger. The hypothesis to be tested is that the samples are from the 
same normal population, or from normal populations of equal 
variance. A value of F less than the 5 % point is not significant. 
A value between the 5 and 1 % points is significant at the former 

* Statistical Methods for Research Workers, 1925. 
| Snedecor, 1934, 2 or 1938, 3, pp. 184-7. 
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TABLE 6. The variance ratio. 5 and 1 % 'points' of F 

v t is the number of degrees of freedom for the greater 
estimate of variance, and v t for the smaller 



\ 

"*\ 


1 


2 


3 


4 


5 


6 


8 


12 


24 


00 


2 


18-51 
98-49 


19-00 
99-00 


19-16 
99-17 


19-25 
99-25 


19-30 
99-30 


19-33 
99-33 


19-37 
99-36 


19-41 

99-42 


19-45 
99-46 


19-50 
99-50 


3 


10-13 
34-12 


9-55 
30-82 


9-28 
29-46 


9-12 
28-71 


9-01 
28-24 


8-94 
27-91 


8-84 
27-49 


8-74 
27-05 


8-64 
26-60 


8-53 
26-12 


4 


7-71 
21-20 


6-94 
18-00 


6-59 
16-69 


6-39 
15-98 


6-26 
15-52 


6-16 
15-21 


6-04 
14-80 


5-91 
14-37 


6-77 
13-93 


5-63 
13-46 


5 


6-61 
16-26 


5-79 
13-27 


5-41 
12-06 


5-19 
11-39 


6-05 
10-97 


4-95 
10-67 


4-82 
10-27 


4-68 
9-89 


4-53 
9-47 


4-36 
9-02 


6 


5-99 
13-74 


5-14 
10-92 


4-76 
9-78 


4-53 
9-15 


4-39 
8-75 


4-28 
8-47 


4-15 
8-10 


4-00 

7-72 


3-84 
7-31 


3-67 
6-88 


7 


5-59 
12-25 


4-74 
9-65 


4-35 
8-45 


4-12 
7-85 


3-97 
7-46 


3-87 
7-19 


3-73 

6-84 


3-57 
6-47 


3-41 
6-07 


3-23 
5-65 


8 


5-32 
11-26 


4-46 
8-65 


4-07 
7-69 


3-84 
7-01 


3-69 
6-63 


3-58 
6-37 


3-44 
6-03 


3-28 
5-67 


3-12 
5-28 


2-93 
4-86 


9 


6-12 

10-66 


4-26 
8-02 


3-86 
6-99 


3-63 
6-42 


3-48 

6-06 


3-37 

6-80 


3-23 
6-47 


3-07 
5-11 


2-90 
4-73 


2-71 
4*31 


10 


4-96 
10-04 


4-10 
7-56 


3-71 
6-55 


3-48 
5-99 


3-33 
5-64 


3-22 
5-39 


3-07 
5-06 


2-91 
4-71 


2-74 
4-33 


2-54 
3-91 


12 


4-75 
9-33 


3-88 
6-93 


3-49 
5-95 


3-26 
5-41 


3-11 
5-06 


3-00 
4-82 


2-85 
4-50 


2-69 
4-16 


2-50 

3-78 


2-30 
3-36 


14 


4-60 
8-86 


3-74 
6-51 


3-34 
5-56 


3-11 
5-03 


2-96 
4-69 


2-85 
4-46 


2-70 
4-14 


2-53 
3-80 


2-35 
3-43 


2-13 
3-00 


16 


4-49 
853 


3-63 
6-23 


3-24 
5-29 


3-01 

4-77 


2-85 
4-44 


2-74 
4-20 


2-59 
3-89 


2-42 
3-55 


2-24 
3-18 


2-01 
2-76 


18 


4-41 
8-28 


3-55 
6-01 


3-16 
6-09 


2-93 
4-58 


2-77 
4-25 


2-66 
4-01 


2-51 
3-71 


2-34 
3-37 


2-15 
3-01 


1-92 
2-57 


20 


4-35 
8-10 


3-49 
5-85 


3-10 
4-94 


2-87 
4-43 


2-71 
4-10 


2-60 

3-87 


2-45 
3-56 


2-28 
3-23 


2-08 
2-86 


1-84 
2-42 


25 


4-24 

7-77 


3-38 
6-57 


2-99 
4-68 


2-76 
4-18 


2-60 
3-86 


2-49 
3-63 


2-34 
3-32 


2-16 
2-99 


1-96 
2-62 


1-71 
2-17 


30 


4-17 
7-56 


3-32 
5-39 


2-92 
4-51 


2-69 
4-02 


2-53 
3-70 


2-42 
3-47 


2-27 
3-17 


2-09 
2-84 


1-89 
2-47 


1-62 
2-01 


40 


4-08 
7-31 


3-23 
5-18 


2-84 
4-31 


2-61 
383 


2-45 
3-51 


2-34 

3-29 


2-18 
2-99 


2-00 
2-66 


1-79 
2-29 


1-51 
1-81 


60 


4-00 
7-08 


3-15 

4-98 


2-76 
4-13 


2-52 
3-65 


2-37 
3-34 


2-25 
3-12 


2-10 

2-82 


1-92 
2-50 


1-70 
2-12 


1-39 
1-60 


80 


3-96 
6-96 


3-11 

4-88 


2-72 
4-04 


2-49 
3-56 


2-33 
3-25 


2-21 
3-04 


2-06 
2-74 


1-88 
2-41 


1-65 
2-03 


1-32 
1-49 



Extracted from G. W. Snedecor'e Statistical Methods, pp. 184-7, by courtesy 
of the author and the Iowa Collegiate Press. 
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level, but not at the latter. A value greater than the 1 % point is 
regarded as highly significant. A significant value of F throws doubt 
on the truth of the hypothesis. This test will be used extensively in 
the next chapter. It is illustrated by the following numerical 
example. 

Example,, Apply the test to the samples of heights given in the examples 
of 86 and 88 (Ex. 2). 

The two estimates of the population variance furnished by the samples 
are 36/8 and 42/9, corresponding to 8 and 9 D.F. respectively. The second is 
the larger, so that 



This value of F is much below the 5 % point as indicated by the table. It is 
therefore not at all significant, so that the samples may very well be regarded 
as drawn from the same population. 



FISHER'S TRANSFORMATION OF THE 
CORRELATION COEFFICIENT 

94. Distribution of r. Fisher's transformation 

The distribution of the correlation coefficient r, in random 
samples of n pairs of values from a bivariate normal population in 
which the correlation is p, was given by Fisher* in 1915. He showed 
that the probability that the correlation coefficient will have a 
value in the interval dr is 



This distribution is far from normal, with a probability curve which 
is very skew in the neighbourhood of p = 1 , even for large samples. 
The use of the S.E. of r is therefore not to be recommended. In a 
subsequent paperf Fisher showed that the transformation 



defines a variate z, whose distribution is approximately normal with 
mean and variance l/(w 3), tending rapidly to normality as the 
size of the sample increases. Thus the S.E. of z is independent of the 

* Fisher, 1915, 2. f Fisher, 1921, 1. 



94] Fisher's Transformation of r 201 

value of r. For the proofs of these properties we must refer the 
reader to Fisher's papers; but the distribution of r in the case of an 
uncorrelated normal population was considered in 82, and the 
corresponding distribution of z will be discussed in Ex. 1 below. 

By means of the statistic z we may test whether an observed 
correlation coefficient differs significantly from some theoretical 
value, or from some value given in advance; or whether the values 
of r obtained from two samples differ significantly. From the 
values of r and p we determine those of z and by (16); and it is 
then easy to decide whether the deviation z - is significant for a 
normal distribution of variance I/(n 3). To obviate the necessity 
of calculating z in every case, Fisher published a table setting out 
the values of r, which correspond to specified values of z ranging 
from to 3 at intervals of 0*01. Extracts from this table are printed 
herewith. 

TABLE 7. Fisher's transformation of r 
Values of r for specified values of z at intervals of 0-02 



z 


0-00 


0-02 


0-04 


0-06 


0-08 


0-0 


0-000 


0-020 


0-040 


0-060 


0-080 


0-1 


0-100 


0-119 


0-139 


0-159 


0-178 


0-2 


0-197 


0-217 


0-236 


0-254 


0-273 


0-3 


0-291 


0-310 


0-328 


0-345 


0-363 


0-4 


0-380 


0-397 


0-414 


0-430 


0-446 


0-5 


0-462 


0-478 


0-493 


0508 


0-523 


0-6 


0-537 


0-551 


0-565 


0-578 


0-592 


0-7 


0-604 


0-617 


0-629 


0-641 


0-653 


0-8 


0-664 


0-675 


0-686 


0-696 


0-706 


o-p 


0-716 


0-726 


0-735 


0-744 


0-753 


1-0 


0-762 


0-770 


0-778 


0-786 


0-793 


1-1 


0-801 


0-808 


0-814 


0-821 


0-828 


1-2 


0-834 


0-840 


0-846 


0-851 


0-857 


1-3 


0-862 


0-867 


0-872 


0-876 


0-881 


1-4 


0-885 


0-890 


0-894 


0-898 


0-902 


1-5 


0-905 


0-909 


0-912 


0-915 


0-919 


1-6 


0-922 


0-925 


0-928 


0-930 


0-933 


1-7 


0-936 


0-938 


0-940 


0-943 


0-945 


1-8 


0-947 


0-949 


0-951 


0-953 


0-955 


1-9 


0-956 


0-958 


0-960 


0-961 


0-963 



Reproduced by permission of the author, Professor R. A. Fisher, from his book 
on Statistical Methods for Research Workers. 

For the case = 0, that is to say for testing whether an observed 
value of r indicates any correlation in the population, the method 
of 89 is preferable. 
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Example 1. For samples from an uncorrelated normal population, we 
know that the distribution of r is 

__ 

*~ 



Fisher's transformation (16) may be expressed 

r =r tanh z, 

so that dr = sech 2 zcfe, 

and the distribution of z is therefore 

dp = sech n - 2 zrfz/#(i, i(-2)). 

Now sechz is approximately equal to exp ( Jz 2 ), so that 
dp oc exp ( (n 2) z 2 ) dz 

approximately. Consequently the distribution of z is approximately normal, 
with variance !/( 2). As Fisher has shown, however, a better approxima- 
tion to the variance of z is l/(n 3). 

Example 2. In a random sample of 28 pairs of values from a bivariate 
normal population, the correlation was found to be 0-7. Is this value con- 
sistent with the assumption that the correlation in the population is 0-5? 

Here r = 0-7, p = 0-5 and n = 28. From the table we find z = 0-87 and 
$=0-55, so that Z _ C=0 . 32 . 

The S.E. of z is 1/V(25) = \, so that 



Since z is considerably less than twice the S.E., its value is not significant. 
So far as this test goes, the correlation in the population might very well 
be 0-5. 

The 95% fiducial limits for p are found in the usual manner (cf. 51). 
The value of must be such that 

|s-C|<l-96(sj5.) = 0-392. 
Consequently 0-87 - 0-392 < < 0-87 -f 0-392 

0-48 << 1-26, 
and therefore from the table 

0-446 </>< 0-851. 

The 95 % fiducial limits for p are therefore 0-45 and 0-85 approximately. 

95, Comparison of correlations in independent samples 

Next suppose that two independent samples of n v and n 2 pairs 
give correlation coefficients of r 1 and r 2 respectively. May they be 
regarded as drawn from the same population; or is the difference 
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between r l and r 2 significant? On the assumption that the samples 
are from the same normal population, the difference between the 
values of z for the two samples is normally distributed with S.B. 



lTV-3 n 2 -3/- 

\ 1 ti I 

The two values of z are 



= / + - 



If \z L z 2 \ is less than 2e, the difference is not significant at the 
5 % level; and the assumption that the samples are from the same 
population, or from equally correlated normal populations, is not 
discredited. 

Example. The first of two samples consists of 23 pairs, and gives a correla- 
tion of 0-5; while the second, of 28 pairs, has a correlation of 0-8. Are these 
values significantly different? 

On the hypothesis that they are from the same normal population, the 
S.E. of the difference of the z's is 



6= 

From the table we find that z l = 0-55 and z a = 1*10, so that 
K-z a | =0-55= l-83e, 

which is a little less than 2e, and is therefore not quite significant at the 5 % 
level. The hypothesis is not discredited. 

If in the above example the value of r l is not given, we may find the limits 
between which it must lie in order that | r l r 2 \ should not be significant at 
the 5 % level. For the condition to be satisfied is 

| *! - 2, | < l-96e=0-588. 
Consequently 1-10 - 0-588 < z l < 1-10 + 0-688 

0-512 <2 1 < 1-688, 
so that 0-47 <r x < 0-93. 

96. Combination of estimates of a correlation coefficient* 

Suppose that k samples of n i pairs of values (i = 1, ..., k) yield 
correlation coefficients r { . We may wish to enquire whether the 
samples may be regarded as drawn from the same normal population 

* Cf. Yates, 1934, 5. 
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(or equally correlated ones); and, if they may be so regarded, to 
obtain a combined estimate of the population value p. To test the 
homogeneity of the estimates r it we make the assumption that the 
samples are from equally correlated populations. Then, by means 
of Fisher's transformation (16), we obtain values z i of variates 
which are approximately normally distributed about a common 
mean, with variances l/(^ 3). The estimate of their common 
mean, , which has minimum variance, is obtained by weighting 
the values z t - inversely* as their variances. This estimate z is therefore 



_ 



S(n,-3)* 



Then, since the variates z i are approximately normally distributed 
about , with variances !/(% 3), the sum 2 (^ 3) (z i z) 2 is dis- 
tributed approximately as # 2 , with k 1 D.F., the mean z having 
been determined from the data.*)* The significance of the calculated 
value of this quantity may be ascertained from the table of % 2 . 
We may express the above sum in a form more convenient for 
numerical calculation. Thus 



= S K - 3) *\ - E K - 3) z f ] 2 /2 to - 3). 

If the calculated value of this expression is not significant as a 
value of x 2 with k I D.F., the estimates r i of the correlation in the 
population may be regarded as homogeneous. In that case the 
value of z given by (17) is an estimate of the true value, , corre- 
sponding to the population coefficient p. The required estimate of 
p is then given by 

p = tanhz, 

and its value may be read off from the table. 

Example. Independent samples of 21, 30, 39, 26 and 35 pairs of values 
yielded correlation coefficients 0-39, 0*61, 0-43, 0-54 and 0*48 respectively. 
May these estimates be regarded as homogeneous? If so, find an estimate of 
the correlation in the population. 

* Cf. Ex. IV, 5. 

t For details of this part of the proof see Ex. X, 10 below. 
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The corresponding values of z i may be taken from the table, and the 
calculation tabulated as follows: 



*i 


*t 


n t 3 


(n,-3)z, 


K-3)z, a 


0-39 


0-412 


18 


7-416 


3-055 


0-61 


0-709 


27 


19-143 


13-572 


0-43 


0-460 


36 


16-560 


7-618 


0-54 


0-604 


23 


13-892 


8-391 


0-48 


0-524 


32 


16-768 


8-786 


Totals 





136 


73-779 


41-422 



The value of x 2 from the data is therefore 

X* = 41-422- (73-779) 2 /136 = 1-4 approximately. 

For 4 D.F. this value is not at all significant, so that the coefficients r t may be 
regarded as homogeneous. From (17) we have 

z = 73-779/136 = 0-5425. 

The table then gives the estimate of p for the population as 0-495 = 0-5 
nearly. 
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EXAMPLES X 

1. A random sample of 16 values from a normal population 
showed a mean of 41*5 in., and a sum of squares of deviations from 
this mean equal to 135 in. 2 Show that the assumption of a mean 
of 43-5 in. for the population is not reasonable, and that the 95 % 
fiducial limits for this mean are 399 and 43*1 in. 
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Another sample of 20 values from an unknown population has a 
mean of 43-0 in., and a sum of squares of deviations from this mean 
equal to 171 in. 2 Show that the two samples may be regarded as 
from the same normal population. 

2. Nine patients, to whom a certain drink was administered, 
registered the following increments in blood pressure: 7, 3, 1, 4, 
3, 5, 6, 4, 1. Show that the data do not indicate that the drink 
was responsible for these increments. 

On the null hypothesis the above values are regarded as a random 
sample from a normal population whose mean is zero. The data 
give x = 2 and s 2 = 63/4. Hence t = 1-61 with v = 8. This value is 
not significant. 

3. In testing the superiority of Leake's drill over the ordinary 
drill, plots in the form of long strips were cultivated, two adjacent 
strips being allotted at random to Leake's drill and the ordinary.* 
For ten such pairs of plots the values of the excess of the weight of 
grain from the plot treated by Leake's drill over that obtained by 
use of the ordinary drill were 2-4, 1-0, 0-7, 0-0, 1-1, 1-6, 1-1, -0-4, 
0*1 and 0*7. Show that the data furnish strong evidence of the 
superiority of Leake's drill. 

From the data x = 0-83 and (^~^) 2 = 6-001, so that 

5 2 = 6*001/9 = 0-666. 

On the null hypothesis the mean of the population is zero, and 
t = 3-22 for 9 D.F. This value belongs to about the 1 % level of 
significance. There is thus strong evidence against the null hypo- 
thesis. 

4. For a random sample of 10 pigs, fed on diet A, the increases in 
weight in a certain period were 

10, 6, 16, 17, 13, 12, 8, 14, 15, 9lb. 

For another random sample of 12 pigs, fed on diet J3, the increases 
in the same period were 

7, 13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17 Ib. 
* Data from Wishart, 1934, 3, p. 32. 



Examples 207 

Show that, by the test of 88, the mean increases of 12 and 15 Ib. 
in the two samples are not significantly different. 

O 2 = (120 + 314)/20 = 21-7; t = 1-5, v = 20.] 

5. Show that the estimates of the population variance from the 
samples in Ex. 4 are not significantly different. 

6. Deduce from 82(25), that the statistic rj(n-'2)lj(l-r*) 
conforms to the t distribution for n 2 D.F. 

7. A random sample of 18 pairs from a bivariate normal popula- 
tion showed a correlation coefficient of 0*3. Is this value significant 
of correlation in the population? Prove by the method of 89, 
Ex. 2, that the least value of r significant at the 5 % level is 
about 0-47. 

8. A random sample of 19 pairs from a bivariate normal popula- 
tion showed a correlation of 0'65. Prove that this is consistent with 
the assumption of a correlation of 0-40 in the population. Also 
show that the 95 % fiducial limits for p are 0-28 and 0-85 approxi- 
mately. 

A second sample, of 23 pairs, showed a correlation of (MO. Prove 
by the method of 95 that the two samples may be regarded as 
from equally correlated populations. 

9. Show that the mean value of the positive square root of a 
Beta variate of the second kind, with parameters I and m, is 



Deduce that the mean value of 1 for v D.F. is 



Also show that, for samples from an unconelated bivariate normal 
population in which the variances are erf and erf, the mean value of 
the modulus of the regression coefficient of y on x is 
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10. To prove that the sum ( n i 3) (z< z) 2 of 96 is distributed 

i 

approximately as ^ 2 with fc 1 D.F., we may proceed by the method 
of 77. Denoting the sum by T 2 we have 

r - Z K- 3) [(*,-)- (2- )] 

= 2K-3)(^-^) 2 -2(z-OS( i -3)(^-D + (2-D 2 i:K-3) 
- Z (* - 3) (z -)-& (i) 

where & = 2 K - 3) ( - S)A/Z ( - 3). 

Now introduce an orthogonal linear transformation 

fe-ScyS, 0*= 1,2,. ..,&), 
; 

where o? y = (z, - ) % /K- - 3), 

and the first of the 's is identical with x above. Then, since the re's 
are independently and normally distributed about zero with unit 
S.D., so are the 's; and in virtue of (i) 

k k k 

rp2 _ v r 2 r2 V A 2 f 2 V 2 

-* 2j ^ bl 2-i bi bl ~ 2j bi 
i=l i-1 i-2 

Consequently T 2 is distributed like ^ 2 with A 1 D.F. 

11. Show that, for the t distribution with v D.F., the moment ol 
order r (even) about the origin is, for r < v, 



12. For samples of n from a population with the exponentia 
distribution / (x) = e~~ x , x^Q, show that the range conforms to 

$(w) = (w- 1) e-(l -e-^) 71 " 2 , 
and hence that E(w) = 1-f | + J + J + . . . H -- . 



CHAPTER XI 

ANALYSIS OF VARIANCE AND COVARIANCE 
ANALYSIS OF VARIANCE 

97. Resolution of the 'sum of squares' 

In the words of its author, R. A. Fisher, analysis of variance is 
the ' separation of the variance ascribable to one group of causes 
from the variance ascribable to other groups'.* It is a procedure by 
which the variation embodied in the data of the sample may be 
resolved into component variations due to independent factors. 
Each of the components yields an estimate of the population vari- 
ance; and these estimates are tested for homogeneity by means of 
the F table. 

Consider a random sample of N values of a normally distributed 
variable x. It is frequently possible to arrange these in classes 
according to a certain factor or criterion. For instance, if the 
variable is the price of a certain commodity, the classes may 
correspond to different seasons or to different districts. Or, if the 
variable is the crop yield of a variety of cereal, the classes may 
correspond to different manurial treatments. Let x^ denote the 
value of the jth member in the ith class. Thus the first subscript 
indicates the class, and the second the position in that class. Let 
n t be the number of members in the ith class, ^ the mean value for 
that class, and x the general mean for the whole sample of N values. 

Then ^ = v S z SS (*-) = (1) 

i J i 1 

and n^ = x if , S (*u - **) = 0- (2) 



To resolve the sum of squares of the deviations of the N values x tj 
from the general mean, we may do so first for the members of the 
ith class. Thus, in virtue of 3 (10), 

S (*ti ~ *? = S (* - *i) 2 + *t(5i - *)*> 
* Fisher. 1938, 2, p. 216. 
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and on summing this result for all the classes we have the required 
resolution of the * sum of squares ' 

S S (% - *) 2 = S 2 (*u - *i) 2 + S n<(*< - x)\ (3) 

i 1 i 1 i 

This formula holds, of course, whether the population is normal 
or not. 

98. Homogeneous population. One criterion of classification 

Now suppose that the population, from which the random sample 
of N values was drawn, is homogeneous with respect to the factor of 
classification, that is to say, that the factor has no effect upon the 
value of the variate. Then if the population is divided into classes 
according to this factor, the different classes will have the same 
statistical properties. In particular, they will have the same mean 
fi and the same variance a" 2 , which are the mean and the variance of 
the population. Then, from the various sums in (3), we can obtain 
three unbiased estimates of cr 2 . For, by (7) and (8) of 54, the 
expected value of the first sum in sampling is (N l)<r 2 ; so that this 
sum, divided by Nl, gives an unbiased estimate of or 2 based on 
Nl D.F. Similarly, the n t values in the ith class constitute a 
random sample whose mean is x i9 so that 



and therefore* 



i i i 

where h is the number of classes. Thus the second sum in (3), divided 
by N h, gives an unbiased estimate of a 2 based on N h D.F. 
And, since the expected values of the two members of (3) must be 
equal, that of the final sum* is (A- l)<r 2 ; S o that this sum, divided 
by A 1, gives an unbiased estimate of cr 2 based on A- 1 D.F. The 
identity 



(N -I)a* = (N -h)a* + (h-I)<r* (4) 

obtained by taking expected values of the various sums in (3), 
shows that degrees of freedom are additive, the number of freedoms 

* See also Ex. XI, 1, below. 
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Corresponding to the total sum being equal to the sum of the 
freedoms corresponding to the partial sums. The above results are 
usually tabulated as follows: 



Source of variation 


D.F. 


Sum of squares 


Mean square 


Between class means 
Within classes 


h-l 

N-h 


Ln^-*) 
SStetf-aJ 1 


Sn.fc-ZJVt/*-!) 
< 

XZ(x t j-x t )*/(N-h) 


Total 


N-l 


SS(* W -*) 2 
< / 






In the columns headed 'D.F.' and 'Sum of squares' the items are 
additive, but not in the last column which gives the estimates of cr 2 . 
The argument so far holds whether the population is normal or 
not. In the case of a homogeneous normal population the results 
follow from the distributions of the various sums in (3). For then 
the first sum divided by cr 2 is distributed like x 2 with N 1 D.F., as 
proved in 77. The mean value of this sum is therefore (N l)cr 2 . 



Similarly, 



a X 2 with n i 1 D.F.; and therefore the 



second sum in (3), divided by cr 2 , is distributed like x* with 
( n . _ !) = (#- &) D.F. 

i 

The mean value of this sum is therefore (N h) a* 2 . Similarly for 
the final sum in (3). 

In order to test the homogeneity of the estimates of cr 2 by means 
of the variance ratio and the F table, ifc is necessary to assume that 
the population is normal; for this test is founded on that assumption. 
The practice is to compare the estimate ' between class means ' with 
that obtained 'within classes'. For the final sum in (3) represents 
the variation due to the factor of classification, and the second sum 
in (3) is the residual variation after the former has been removed. 
If the estimate obtained between classes is significantly greater 
than that within classes, we are justified in concluding that the 
factor of classification exercises an influence on the value of the 
variable. In that case the assumption of homogeneity is discredited, 
and we must regard the population as heterogeneous. If, however, 
the estimates of cr 2 are not significantly different, the test provides 

14-3 
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no evidence against the hypothesis of a homogeneous population. 
It is important to remember that the variance ratio may be tested 
by means of the F table only if the two estimates of variance are 
statistically independent. Since the mean of a random sample from 
a normal population is distributed independently of its variance, 
the two sums in the second member of (3) are independent, and the 
required condition is satisfied.* 

99. Calculation of the sums of squares 

In calculating the above sums of squares it is not necessary to 
find the deviations from the various means. We know that the sum 
of the squares of the deviations of N numbers x 8 (s = 1, ..., W), from 
their mean x is, in virtue of 3 (1 1), 



where T is the sum of the numbers. Applying this formula to the 
various sums considered above we have 



i i i i 

the grand total T being given by 



Similarly, summing first for the values in the ith class, we have 



i > i i 

where T t is the sum of the values in the ith class. Consequently 



- *i) 2 = S S *?y - S Tllni. (6) 

i i i j ' i 

Subtracting (6) from (5) we find for the third sum of squares 



* See also Fisher, 1925, 1 and Irwin, 1934, 4. 
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This result may also be obtained directly. For, since ^ is the 
frequency of the value x i9 

S n t (x t - ) = S n<x\ - (S <*<) 2 /^ = S ?>, - T/tf 

i i i i 

as stated. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (5), (6) and (7) are unaltered 
by a change of origin. In other words, if all the values x^ are 
decreased (or increased) by the same constant, the values obtained 
for the three sums of squares are unchanged. The arithmetic may 
often be simplified in this way, large numbers being replaced by 
much smaller ones. 

Ex. 35 plots, of approximately equal fertility, were sown with 7 different 
varieties of wheat, 5 plots to each variety, the distribution of varieties among 
the plots being random. The following table gives the yields of grain in 
bushels per acre, the 7 columns corresponding to the different varieties. Do 
the data (fictitious) indicate a significant difference in the yields of the 
varieties ? 

13 15 14 14 17 15 16 

11 11 10 10 15 9 12 
10 13 12 15 14 13 13 
16 18 13 17 19 14 15 

12 12 11 10 12 10 11 

The classification is according to variety. The number of classes is h = 7 , 
and the number of items in each class is n { = 5. Consequently N = 35. The 
arithmetic is simplified by shifting the origin to (say) x = 12. Diminishing 
all the yields by 12 we may rewrite the table: 

1322534 



1 


-1 


o 


-2 


3 


-3 





2 


1 





3 


2 


1 


1 


4 


6 


1 


5 


7 


2 


3 








-1 


-2 





-2 


-1 



from which we have 

T t = 2, 9, 0, 6, 17, 1,7; T = 42; 

x< = 0-4, 1-8, 0, 1-2, 3-4, 0-2, 1-4. 

Hence S S a = 266, T*/N = 60-4, 2 T*/n { = 92. 

i 1 i 
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The three sums of squares are therefore, by (5), (6) and (7), 

-a) a = 266-50-4 = 215-6, 



In tabular form: 



2 S (x it - x,) 2 = 266 - 92 = 174, 
i i 

IXfo-a) 2 = 92-50-4= 41-6. 



[XI 



Source of 
variation 


D.F. 


Sum of 
squares 


Mean 
square 


F 


Between varieties 
Within varieties 


6 

28 


41-6 
174 


6-933 
6-214 


M 


Total 


34 


215-6 


__ 






For v l = 6 and v 2 = 28 the value 1-1 of F is not significant. Since the estimates 
of variance between varieties and within varieties are not significantly 
different, the experiment as a whole does not indicate significant variation 
in the yields of varieties. 

100. Two criteria of classification 

Consider next the case in which the N values x^ of the data may 
be classified according to two different criteria, A and B. For 
simplicity suppose that A determines h different classes, and B 
determines k different groups; also that the hk values of the variable 
are such that, in each of the h classes there is one value from each 
group, and in each of the k groups one value from each class. For 
the purpose of calculation the hk values may be arranged in a 
rectangular array of h columns and k rows, the columns corre- 
sponding to classification A and the rows to B. The double suffix 
notation will indicate that x if belongs to the tth class and the jth 
group; and, in the rectangular array, this value occurs in the ith 
column and the jth row. As before x denotes the general mean; 
x i is the mean of the values in the ith class, and x^ the mean of those 
in the jth group. 

The argument leading to (3) is valid here also, the resolution 
expressed by that equation being with respect to the means of 
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columns. We may use (3) again to resolve the sum 2 S (%~#i) 2 > 

i 1 

this time, however, with respect to the means of rows. In doing so 
we take as a new variable 



each value of the original variable being diminished by the mean of 
the column in which it lies. The values X^ may be arranged in h 
columns and k rows corresponding to those of x^. Now the general 
mean of X^ is zero, since the means of x^ and x t are each x. Thus 
-X" = 0. Similarly the mean of the values X^ in the jth row is 

Xj = Xj x. 

Accordingly, on applying (3) to the quantities X^ 9 but taking row 
means instead of column means, we have 

S 2 (X ti - X)* = S h(X } - Xf + 2 S (* - X,)*, 

i j i i i 

or its equivalent 

2 S (* - *i) 2 = 2 A(s, - *) 2 + 2 2 fa, - ^ - 5, + ) 2 - (8) 

i i ) i i 

Substituting this value in (3), and remembering that the numbers 
n i are each equal to k, we have the resolution expressed by 



(9) 
i j i j i 1 

where Yy = x^ x i x^ + x. 

As in the preceding section, the expected value of the first member 
of (9) is (lik 1)(T 2 . Similarly, on the assumption of a homogeneous 
population, the expected values of the first two sums in the second 
member are (h-I)cr 2 and (4 l)cr 2 respectively. That of the final 
sum is therefore (hk h k+ l)cr 2 . Thus the various sums in (9), 
divided by (hk 1), (h 1), (fc1) and (h l)(k 1) respectively, 
give unbiased estimates of the population variance based on degrees 
of freedom represented by these divisors. When the population is 
normal the four sums in (9), divided by cr 2 , are distributed like ^ 
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with degrees of freedom as stated above; and the mean values of 
the various sums follow from this. The above results are usually 
tabulated: 



Variation 


D.F. 


Sum of squares 


Mean square 


Between classes 
Between groups 
Error 


h-l 

k~l 
(h-l)(k-l) 


Sfcfo-*) 1 
LM35*-*) 1 

?r r; < 


The quotient of ' sum 
of squares* by D.F. 
in each case 


Total 


hk-l 


SS (*<,-) 






Degrees of freedom are additive, as well as sums of squares. The 
variation corresponding to the factors of classification is repre- 
sented by the first two sums in the second member of (9). The un- 
controlled variation represented by the residual sum 2 ^fy * s 

t i 

due to a variety of causes, which are grouped under the term 
'error'. 

To test the hypothesis of homogeneity in the population, we 
compare the two estimates of variance obtained between classes and 
between groups with that obtained from error. If the factor corre- 
sponding to either classification has a significant effect upon the 
value of the variable, this will appear in the corresponding mean 
square. In order to test the variance ratio by means of the F table* 
the population must be assumed normal. If either of the first two 
estimates of variance is significantly different from that obtained 
from error, the hypothesis of homogeneity is discredited. The 
significance of the difference of the means of any two classes, or 
any two groups, may be tested by means of the t table, as in the 
example below. 

Example. An agricultural experiment was conducted to test the effects 
of change of soil (5 blocks) and variety of wheat (7 different strains) on the 
yield of grain. Each block was divided into seven plots, and the plots of each 
block were assigned at random to the seven varieties. The yields, in bushels 
per acre, are set out in the same rectangular array as in the example of 
99, columns corresponding to varieties and rows to blocks. Discuss the 
significance of the variation of yield with the two factors. 

* The independence of these estimates of variance may be established 
by Cochran's method. See 1034, 6. 
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With origin at x = 12 the values of T, T it x if the total sum of squares and 
the sum of squares corresponding to varieties are as already found. Show 
similarly that 

T, = 20, -6, 6, 28, -6 

x, = 2-86, -0-86, 0-86, 4, -0-86 

S TJ/7 = 184-6, S TJ/7 - T 2 /35 = 134-2. 
i 1 

The tabulation of results is : 



Source of 
variation 


D.F. 


Sum of 
squares 


Mean 
square 


F 


Varieties 
Blocks 
Error 


6 
4 

24 


41-6 
134-2 

39-8 


6-93 
33-55 
1-66 


4-2** 
20-2** 


Total 


34 


215-6 









The double asterisk indicates that the value of F is significant at the 1 % level 
(a single asterisk denoting significance at the 5 % level). Thus the yields of 
the varieties are significantly different; and the experiment indicates a very 
marked variation in soil fertility from one block to another. 

We may use the t test to examine more closely the difference between the 
mean yields of any two varieties, say the second and third. The difference is 
#2 ir 3 = 1*8. To test the significance of this we use the estimate of variance 
obtained from * error', and corresponding to 24 D.F. This value, 1-66, is the 
estimated variance of the yield of a single plot. The estimated variance of 
the mean of 6 plots is thus 1-66/5; and that of the difference of the means of 
two independent samples of 5 plots each is 1-66 x 2/5 = 0-664. The S.E. of 
the difference of the means of two varieties is therefore 0-815 nearly. The 
value of t for the above two varieties is thus 1-8/0-815 = 2-2. For 24 D.F. 
this is significant at the 5 % level. The least difference, ra, that is significant 
is given by m/0-815 = 2-06, so that m 1*68 nearly. Show also that the 
S.B. of the difference between the mean yields for two blocks is 0*69, and that 
a difference of 1-4 is significant at the same level. 

101. The Latin square. Three criteria of classification 

In the general case of three criteria of classification, corresponding 
to A, k, p classes respectively, we should require a three-dimensional 
generalization of the rectangular array employed above. In the 
particular case for which h = k = p this requirement is obviated by 
an arrangement known as the Latin square. As it is customary to 
use the letters A, B, C, ... to distinguish the different classes of one 
of the three classifications, we may conveniently explain the Latin 
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square, of order n, as an arrangement of the n letters A, J3, C, ... 
in the form of a square array, such that each letter occurs once in 
each column and once in each row. Consequently each letter occurs 
?i times in a Latin square of order n. The accompanying array is one 
arrangement for a Latin square of order five. 

B D E A C 

C A B E D 

D C A B E 

E B C D A 

A E D C B 

The triple classification and the Latin square arrangement are 
illustrated by the design of the following agricultural experiment. 
The variable is the yield of grain per acre, and the object of the 
experiment is to test the effect on yield due to change of manurial 
treatment, and to variation of soil in each of two perpendicular 
directions. A block of land is divided into n 2 plots, arranged in n 
parallel rows in one of the given directions and n parallel columns in 
the perpendicular direction. The n different treatments are dis- 
tributed at random among the n plots of each row, but in such a 
way that no two plots in the same column are given the same treat- 
ment. We thus have a Latin square in which letters correspond to 
treatments, while rows and columns correspond to soil variation in 
the given perpendicular directions. 

The necessary formulae for calculation are obtained by an easy 
extension of the foregoing results. With the same notation as in the 
preceding sections, the resolution expressed by (9) is still valid. The 

final sum FL may be separated into components by applying 

i 1 
(3) to the variable Yy and resolving, not with respect to the means 

of rows or columns, but with respect to the means of letters. Let 
Y denote the general mean of the values Y ij9 and Y t the mean of the 
values for the Zth letter. Then, since the mean of each of the four 
terms in Y^ has the same magnitude x, we have Y = 0. To calculate 
Y l we consider the four terms of F^ separately. The mean value of 
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tf^ corresponding to the Zth letter we shall denote by x t . The mean 
of the values x i for the Zth letter is x, since each column contains the 
Zth letter once only. Similarly the mean of the values Xj for the Zth 
letter is x. The last term of Yy is constant, and we have the result 

TJ = Xf-x x + x = fy-x. 
Thus, on applying (3) to the values T if and the letters, we deduce 



i i I i i 

or its equivalent 



i i 



Substituting this value in (9), and putting h = k = n, we obtain 
the required resolution 



i 3 i j I i j 

(10) 
where Z = x x X 



The expected value of the first member of (10) is (n 2 - l)<r 2 , and 
on the assumption of a homogeneous population, that of each of the 
first three sums on the right is (n l)er 2 , as proved in 98. Con- 
sequently 

E (S S ?*) = O 2 - 1 ~ 3(- l)]<r 2 = (n- 1) (w- 2)cr a . 
i ; 

Thus each of the first three sums in the second member of (10), 
divided by (n 1), gives an unbiased estimate of cr 2 based on 
(n 1) D.F.; and the final sum, divided by (n l)(n 2), gives an 
unbiased estimate based on that number of freedoms. In the case 
of a normal population the various sums in (10), divided by a* 2 , are 
distributed like x 2 wfth D.F. equal to those of the corresponding 
estimates of cr 2 . The results may be tabulated as shown below. 
The first three mean squares are compared with that obtained from 
error in the usual manner. The significance of the difference of the 
means of two classes may be tested by the t table as illustrated 
earlier. 
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The various sums of squares may be calculated from the usual 
formulae, putting N = n 2 and h = k = n. Thus 



Similarly, 



n 



where T 7 is the sum of the values for the Zth letter; and so on. By 
subtraction we have therefore 



- = 



4 - ( 



It is usual to find this sum by subtraction after the other sums 
have been calculated. 



Source of 
variation 


D.F. 


Sum of 
squares 


Mean square 


Columns 


n-1 


nSfz,-*) 2 








i 




Rows 

Treatments 


n-1 
n-1 


*iV Cv ^r\ 2 
""Zj \'j *>) 

1 


The quotient of *8um 
of squares' by D.F. in 
each case 






I 




Error 


(n-l)(n~2) 


w } * 




Total 


n a -l 


SS (*</-*)* 






Example. An agricultural experiment was conducted on the Latin square 
plan to test the effect on yield due to change of treatment (5 kinds) and also 
to variation of soil in each of two perpendicular directions. The results are 
set out in the Latin square below (n = 6), in which letters correspond to 
treatments, while rows and columns correspond to the two perpendicular 
directions. Are the effects on yield significant ? 



A 7-4 


D 8-9 


E 5-8 


B 12-0 


C 14-3 


C 11-8 


B 6-5 


A 8-7 


E 7-6 


D 7-9 


D 10-1 


C 17-9 


B 9-0 


A 8-5 


E 7-1 


E 8-8 


A 10-1 


C 15-7 


D 11-1 


# 7-4 


B 11-8 


E 8-8 


D 14-3 


C 18-4 


A 10-1 



Shifting the origin to x = 10 (i.e. reducing each of the above yields by 10) 
the reader may easily verify that 



sr-O-l, 2-2, 3-6, 
= -1-8, -7-5, 2-6, 
=-5-2, -3-3, 28-1, 



7-6, -3-2. 
3-1, 13-4; 
2-3, -11-9. 



= 10. 
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Consequently T*/N = 10 x 10/25 = 4-00. 

The various sums of squares are given by 

= 285-18-4-00 = 281-18 (Total) 

= 17-02-4-00= 13-02 (Columns) 
i 

22?/n-T 2 /N= 60-95-4-00= 46-95 (Rows) 

S T?/n - T*/N = 194-89 - 4-00 = 190-89 (Treatments) 
I 

Remainder = 30-32 (Error) 
The tabulation is: 
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Source of 
variation 


D.F. 


Sum of 
squares 


Mean 
square 


F 


Columns 
Rows 
Treatments 
Error 


4 
4 
4 
12 


13-02 
46-95 
190-89 
30-32 


3-255 
11-737 

47-722 
2-527 


1-3 

4-6* 
18-9** 


Total 


24 


281-18 









For v l = 4 and i> a = 12 the 5 and 1 % values of F are 3-26 and 6-41. Thus 
the variation in rows is significant, and that due to treatments is highly 
significant. 

Show that the S.B. of the difference of the means of two classes is 1 '005, and 
hence that a difference of means less than 2- 2 is not significant at the 5 % level. 
Thus the yield due to treatment C is significantly greater than that due to 
any other treatment; and the yield due to D is significantly greater than 
that due to E. 

102. Significance of an observed correlation ratio 

We shall next consider some applications of analysis of variancef 
to testing the significance of an observed correlation ratio, coefficient 
or index, and to testing the linearity of a regression. First suppose 
that a value, ?/, of the correlation ratio of y on x, is obtained from a 
random sample of N pairs of values from a bivariate population 
in which y is normally distributed. We wish to test whether this 
value is significant of an association between the two variables in 
the population. Let the N pairs of values of the variables be 
arranged in arrays as in Chapter v, so that the j/'s are classified 



f According to the definition given by Wishart and Sanders (see 106, 
below) these applications should be classified under * Analysis of Covariance '. 
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according to the corresponding values of x. Then since the subscript 
i indicates the array, and j the position in that array, we have as in (3) 

2Z(^-) 2 = S(^-&) 2 + ^<-) 2 > (ii) 

i l i J i 

y i being the mean value of y in the ith array, and n t the number of 
values in that array. In virtue of 34 this equation corresponds to 
the identity ^ ^ NS*(l-<q*) + NSlii* 9 (12) 

SI being the variance of y in the sample. 

On the assumption that there is no association between the two 
variables in the population, the y'a in each array may be regarded 
as a random sample from the population of y'a. The various sums 
in (11), divided by the variance <7 2 of y in the population, are then 
distributed like x 2 with (N- 1), (N-h) and (h- 1) D.F., h being the 
number of arrays. The problem is therefore the same as in 98. 
On taking expected values of both members of (1 1) we have 

(N-l)o-* = (N-h)o-* + (h-I)cr*. (13) 

The various sums in (11), divided by the corresponding coefficients 
in (13), yield unbiased estimates of cr 2 . The tabulation is: 



Source of variation 


D.F. 


Sum of squares 


Mean square 


Between arrays 
Within arrays 


h-l 
N-h 


NSW 
NSKl-tf) 


NS-y z /(h-I) 
NS.(\-7i z )/(N-h) 


Total 


N~l 


NS'I 


__ 



To test whether the mean square between arrays is significantly 
greater than that within arrays we have 



with v l h 1, v 2 = N h. 

Example. A random sample of 79 pairs, arranged in 7 arrays of i/'s, gave 
a correlation ratio of y on x equal to 0'4. Is this value significant? 

Here v t = 6, V 2 = 72, 

_ 0-16 72 16 ^ 



Since the 5 % value of F is 2*23, the above value is significant at that level. 
We conclude that there is association between the variables in the population. 
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103. Significance of a regression function 

To test the significance of a regression function is to examine 
whether the data of the sample indicate any degree of association 
of the variables, of the type represented by the regression equation. 
We shall see that, in the case of a linear regression equation, this 
amounts to testing the significance of an observed value of r; while, 
in the case of an equation of curvilinear regression, it is equivalent 
to testing the significance of ari observed value of the correlation 
index R. 

Consider first a linear regression equation for a random sample 
of N pairs of values from a bivariate normal population. If Y t is the 
estimate of y i given by the regression equation, we know by 27 (31) 

that 



i 3 



i 1 



The first sum of squares in the second member is due to deviations 
from the regression function, and the second sum to the regression 
function itself. In virtue of 27 the various sums in (14) are equal 
to the corresponding terms in the identity 

NS* = NSI(l -r 2 ) + Nr*Sl 

On the assumption of an uncorrelated normal population these sums, 
divided by cr 2 , are distributed like x 2 with ^1, N 2 and 1 D.F. 
respectively, as proved in 82. The expected values of the various 
sums in (14) are therefore the corresponding terms of the identity 



and each sum thus gives an unbiased estimate of cr 2 , on division by 
the appropriate number of D.F. The tabulated results are: 



Source of variation 


D.F. 


Sum of squares 


Mean square 


Regression function 

Deviation from the 
regression function 


1 

tf-2 


Xn,(Y>-y? 
SS(2/,,-r ( )* 

* J 


Nr*S~. 
NS:(l-i*)/(N-2) 


Total 


JV-1 


ZS(2/ u -y) a 






If the mean square due to the regression function is significantly 
greater than that due to deviations from this function, we conclude 
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that there is a real association between the variables, of the type 
indicated by the regression equation. To test the significance we 
have for the variance ratio 



The method is thus equivalent to testing the significance of r; and 
it will be noted that the above statistic F is the square of the statistic 
t of 89 (8). The two tests are therefore equivalent. 

Example. A correlation of 0*5 is obtained from a random sample of 26 
pairs from a normal population. Is this value significant? 

Here r = J, N = 26, F = 8**, Vl = 1, v 2 = 24. 

From the table wo see that this value of F is highly significant of correlation 
in the population. 

Obtain the same result by use of the t table. 

In the case of a correlation index R associated with a curved 
regression line, the formulae of 41 show that the above argument 
holds if r 2 is replaced by ffi, except that the numbers of D.F. must be 
altered. If k is the number of statistics that must be calculated from 
the sample to obtain the regression equation, the number of D.F. 
associated with deviations from the regression function is N k, 
and the number associated with the regression function itself is 
fc 1. For testing the significance of the regression function we have 
therefore 



104. Test for non-linearity of regression 

Considering the random sample of N pairs of values from a bi- 
variate normal population, let us return to equation (11). In virtue 
of 36 (18), the final sum in that equation may be further resolved as 



the various sums being equal to the corresponding terms of the 
identity 
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The three sums in (15), divided by cr 2 , are distributed like x 2 w ^h 
h 1, h 2 and 1 D.F. respectively; and on taking expected values 
of both members of (15) we obtain the identity 



The various sums in (15), divided by the corresponding numbers ol 
D.F., give unbiased estimates of cr 2 . That obtained from the first 
sum on the right is associated with deviations of the means of arrays 
from the line of regression of y on x. On the assumption of linearity 
of regression this sum is due to sampling errors; and the estimate 
of cr 2 obtained from it should not be significantly greater than that 
derived from the sum of squares within arrays, i.e. from 



Tabulating as usual we have: 



Source of variation 


D.F. 


Sum of squares 


Moan square 


Linear regression 

Deviation of moans 
from regression line 

Within arrays 


1 

h-2 
N-h 


S,(r,-P) 

Snjfc-r,)* 
SS(2/-,) a 


Nr*S] 

N(i,*-f*)Sl/(h-2) 
N(l-r}*)S- 2 /(N-h) 


Total 


N-l 


SS(2/,,-) a 


__ 



For testing whether the second mean square is significantly greater 
than the third, we have the variance ratio 

?? 2 -r 2 N-h 



If the value of J? 7 is significant, the assumption of linearity of regres- 
sion is discredited. It should be observed that the value of F 
depends not only on the difference t] 2 r 2 , but also on ?/ 2 , N and h. 
Thus a knowledge of ?? 2 r 2 is by itself insufficient to decide the 
question of linearity of regression. 

WMS 15 
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Example. A random sample of 200 pairs of values from a bivariate normal 
population, when grouped in 10 arrays of y's, gave values r = 0'3and7/ y = 0*4. 
Are these results consistent with the assumption of linearity of regression ? 

Here JV ~ 200, h ~ 10, v l = 8, >> a = 190 and 

0-07 190 
F = X = 1-U8. 

0-84 8 

This value is just on the 5 % level of significance. The assumption of linearity 
of regression is thus rather discredited. 



ANALYSIS OF COVARIANCE 

105. Resolution of the 'sum of products'. One criterion of 
classification 

Analysis of co variance has been described as 'the technique of 
testing for homogeneity in problems dealing with two or more 
correlated variables'.* Its main use is to test the significance of 
the difference between the mean values of a variate y in certain 
classes, when these have been corrected for differences in some 
concomitant variable x. The method of doing this will be explained 
shortly; but we must first study the necessary algebraical tools. 

The resolution of the total variation into components, already 
studied in analysis of variance, has its counterpart in analysis of 
covariance; and the algebra of the two processes runs along parallel 
lines. The total covariation of a bivariate sample, represented by 
the sum of the products of the deviations of the variates from their 
means, may be resolved into components associated with different 
factors; and from these components, and the corresponding com- 
ponents of variation of x and of j/, estimates of the coefficient of 
regression (or correlation) in the population are determined. The 
e-etimate from * error' is tested for significance as in 103, 89 or 90; 
and, if this proves to bq significant, the other estimates are tested 
by comparison with it. We are thus able to estimate the effects of 
the various factors on the degree of association of the variates. 

Suppose that the data consist of N pairs of corresponding values 
of the two variates, x and y, and that these may be grouped in h 
different classes according to a certain criterion. With the double 

* Wislmrt and Sanders, 1936, 3, p. 46. 
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Suffix notation, (x^y^) is the jth pair of values in the ith class 
(i =3 1, ..., h). The numbers of pairs in the different classes are not 
necessarily equal. Let n t be the number of pairs in the ith class, so 
that S n i N. As before let x, y denote the general means of the 

two variables, and x i9 y i the means of their values in the ith class. 
Then 

0,). (16) 



The deviations of x^ and y^ from their general means are ex- 
pressible as 



-y** (y^ - &) + (Hi - 



On forming their product, and summing over all pairs of values, 
we have 



S S ( x u - *) (y - y) = S S (^ - <) (y - 

i j i ^ i 

(17) 

the remaining sums disappearing in virtue of (16). Corresponding 
to this we have also the resolution of the sum of squares of the #'s, 
expressed by (3), and that of the i/'s given by a similar formula. 

Suppose now that the N pairs of values constitute a random 
sample from a homogeneous population, in which the covariance 
is /i lv Then, by 61, the expected value of the sum in the first 
member of (17) is (N !)/%. Similarly, since the values in the ith 
class may be regarded as a random sample of n i pairs, 



so that 



Consequently, since the expectations of the two members of (17) 
must be equal, that of the final sum in the equation must be 
(h 1) /i u . Thus the various sums in (17), divided byNlyN h and 
A 1 respectively, give unbiased estimates of the covariance of the 
population, based on numbers of D.F. represented by these divisors. 
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Following Wishart and Sanders* we may conveniently denote 
the three sums in (17) by C", C" and C respectively, the equation 
being then equivalent to C" = C" -f C. The corresponding sums of 
squares for the x's will be denoted by A", A' and A, and those for the 
i/'s by J3", B' and B. Then (3) and its counterpart are equivalent to 

A" = A'+A, B" = B' + B. 

The resolution of sums of squares and products may be combined 
in a single table as follows: 



Source of 
variation 


D.F. 


Sums of 
squares 


Sum 
of 
pro- 
ducts 


Coefficient 
of 
regression 


Coefficient 
of 
correlation 


Between classes 


h-l 


A, B 


C 


6 = C/A 


r = CM(AB) 


Within classes 


N h 


A', B' 


C' 


b =C /A 


r = C ly/(A B ) 


Total 


N-l 


A;B> 


G" 





- 



We thus obtain different estimates of the coefficients of regression 
and correlation. By means of 103, 89 or 90 we first test whether 
the estimate obtained within classes is significant of correlation in 
the population. If it proves to be significant, we proceed to test the 
differences between the class means after these have been corrected 
for regression. Incidentally we may also test the significance of the 
difference between 6 and 6'. 

106. Calculation of the sums of products 

The sum of products of the deviations of N pairs of numbers 
x 8 , y Q (s = 1, . . ., N), from their means x 9 y is given by 



(18) 



where T-S*. 

Applying this formula to the sums in (17) we have 

C' = s s (% - 2) (y - y) - S S *y - T T'/N, 

* Wishart and Sanders, 1936, 3, p. 48. 



(19) 
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jP, jP' being the grand totals for x and y respectively. Similarly, on 
summing first for the values in the ith class, we have 

C' = S S (sy - ^i) (2/o- ~ ft) = S (S *!to ~ 21 2><), 



where 

these being the sums of the values of x and y in the ith class. Con- 
sequently Qt = 2^,-SWK. (20) 

i j i 

Subtraction of (20) from (19) gives the remaining sum 

C^^TiT'tlni-TT'lN, (21) 

i 

which may, of course, be obtained independently. 

Since the deviations from the means are independent of the choice 
of origin, the results obtained by using (19), (20) and (21) are 
unaltered by a change of the origins of x and y. In other words, if 
all the values x^ are decreased (or increased) by the same constant, 
and all the values y^ by another constant, the results obtained for 
the three sums of products are unchanged. The arithmetic may often 
be simplified in this manner, large numbers being replaced by much 
smaller ones. 

107. Examination and elimination of the effect of regression 

The method of 103, for testing the significance of an estimated 
regression, consists in resolving the total sum of squares of the j/'s 
into two components, one due to regression and the other to devia- 
tions from the regression line, and then comparing the estimates 
of population variance derived from these two sums. It will be 
observed that the test was applied to a sample without any attempt 
at classification, that is to say, quite apart from any resolution of 
the variation into components due to factors other than regression. 
In terms of the total sums of squares and products, A" ', B", C* ', 
the sum of squares due to regression is expressible as 

NSlr* = B"(C"*IA"B n ) - C^/A", (22) 

and that due to deviation from the regression line is B" C" 2 jA" . 
The former corresponds to 1 D.F. and the latter to ^-2, which is 
one less than for the total sum of squares. 
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Now we have seen how, after the values in the sample have beeii 
classified, we may resolve the total variation of the variables, and 
their covariation, into components between the means of classes 
and within classes respectively. From each set of components we 
may calculate an estimate of the regression coefficient and a line 
of regression; and the corresponding variation of the y's may be 
further resolved by the method of 103 into a part due to regression, 
and another due to deviation from the regression line, the number 
of D.F. for the latter being one less than the total D.F. for that com- 
ponent. Applying this process to each line of the table at the end 
of 105, we find from each a sum of squares of deviations from the 
corresponding line of regression, with D.F. as indicated in the 
following table:* 



Source of 
variation 


D.F. 


Residual sum 
of squares 


Between classes 
Within classes 


h- 2 

N-h-l 


B-C 2 fA 
B'-C'tjA' 


Total 


N-2 


B'-C"*IA" 



leading to different estimates of the variance of y in the population. 
Denoting the estimate within classes by s 2 we have 



5 _ 

8 ~ 



This is the estimate of the variance of y after correction for regres- 
sion. The significance of any other estimate of the variance is tested 
by comparison with it. 

The significance of any apparent regression is first tested by the 
method of 103, applied to the sums within classes; that is to say, 
we compare the estimate C' 2 /A' of variance, due to regression and 
based on 1 D.F., with the above estimate s 2 based on N h I D.F. 
If the regression proves to be significant we proceed to test the 
differences between class means after correction for regression. 
Subtracting the second row of the above table from the third we 
have the sum of squares J3+ C' 2 /A' C" 2 /A" with h 1 D.F. , which 
we also compare with the sum of squares within classes. If it proves 

* Cf. Wishart and Sanders, 1936, 3, p. 49. 
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'to be significant we conclude that there are differences between the 
class means, after these have been adjusted by the regression 
coefficient &' within classes. 

Now in the first line of the above table we have a sum of squares 
between class means, adjusted by the regression coefficient 6, and 
corresponding to (h 2) D.F. The testing of this sum in place of the 
above raises the question of the significance of the difference between 
b and &'. This may be examined as follows. Tabulating the two sums 
of squares just mentioned, and their D.F., we may arrange the work: 



Between classes 


D.F. 


Sum of squares 


Adjusted 
Adjusted by 6 
Difference 


h-l 

h-2 
I 


B + C'VA'-C'V^* 
B-C 2 /A 
C 2 /A + C'*/A'-O**/A* 



Now the sum in the last row may be expressed 

<7 2 C' 2 ((7+C") 2 AA' 1C C"\ 2 AA(b-b'Y 



A^ A' A+A' A+A'\A A'] A+A' * 

Comparing this estimate of variance, based on 1 D.F., with the 
estimate s 2 based on N h 1, we have the variance ratio 

AA'(b-b'? N-n-_ 

A a. A' K' C*'*l A" \ ' 

JTJ. ~r -" X> U IjcL 



A rare value of F indicates that the two coefficients, b and &', are 
significantly different; in which case the factor of classification has 
an effect upon the degree of association of the variables. 

Example. To examine the relation between the yield of grain (x bushels/ 
acre) and the cost of production (y shillings/bushel) in a certain state, six 
districts were chosen at random, and in each a random selection of five farms 
was made. The results for the season are tabulated below, columns corre- 
sponding to districts. Is there any significant indication of correlation 
between the variables; and, if so, does its value vary with the district? 



x y 


x y 


* y 


x y 


a y 


x y 


13 3-5 


10 4-7 


16 2-8 


9 4-5 


12 3-8 


15 4-7 


11 4-0 


9 5-3 


13 3-0 


12 3-6 


17 2-2 


11 6-1 


18 2-5 


8 6-6 


15 2-8 


10 4-0 


19 2-0 


9 6-0 


14 3-7 


12 4-0 


10 4-2 


13 3-4 


11 4-6 


14 4-3 


12 4-3 


11 4-4 


12 3-2 


8 5-5 


14 3-4 


17 3-9 
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Shifting the origin to x = 12, y = 3, and rewriting the table, the student 
will easily verify that 

2\ = 8, -10, 6, -8, 13, 0; T = 15, 
T;=3, 9, 1, 6, 1, 9; T'=29, 

and henco that 

T*/N = 7-5, T' 2 /N = 28-03, TT'/N = 14-5, 2 T t Tj/5 = - 8-2, 

i 

2 S 4 = 259, Z S y* = 56-96, S Z * t ,y,, = - 54-8. 
i j i I i ) 

Consequently 

A = 93-8-7-5 = 86-3, ^L x = 259-93-8 = 165-2, 
A" = 259- 7-5 = 251-5. 

B = 41-8-28-03 = 13-77, ' = 56-96-41-8 = 15-16, 
B" = 56-9-28-03 = 28-93. 

C - - 8-2 - 14-5 = - 22-7, C" = - 54-8 + 8-2 = - 46-6, 
C" = -54-8 -14-5 = -69-3. 

The tabulated results are: 



Source of 
variation 


D.F. 


Sum of squares 


Sums of 
products 


Coefficient 
of 
regression 


(* 2 ) 


('/) 


Between districts 
Within districts 


5 

24 


86-3 
165-2 


13-77 
15-16 


-22-7 
-46-6 


-0-263 
-0-282 


Total 


29 


251-5 


28-93 


-69-3 


-0-276 



First, to test the significance of b' we have, as explained above, 



Vl = 1, v 2 = 23. 

Thus the value of 6 7 is highly significant of negative correlation. 

To test the differences of class means, after these have been corrected for 
regression, we calculate the estimate of variance 



B+ C' 2 /A'~ C"*IA" _ 13-77 + 13-14-19-09 
/i-1 ~ 5 

and compare this with the estimate within classes 

8* = 2-02/23 = 0-088. 



= 1-56, 
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The variance ratio is 

F = 1-56/0-088 = 17-7** 

which, for v l = 5 and v z = 23, is highly significant. We conclude that the 
cost of production, corrected for differences in yield, varies significantly from 
district to district. 

By using (24) show that b and &' do not differ significantly. 

108. Two criteria of classification 

Suppose next that the N pairs of values (x^y^} in the sample 
may be classified according to two different criteria, H and K, the 
former determining h different classes and the latter k different 
groups. Suppose for simplicity that one pair of values from each 
group is present in each class, and vice versa, so that N = hk. The 
pairs of values may be arranged in a rectangular array of A columns 
and k rows, in which columns correspond to classification H and 
rows to K . The pair (x^, y^) belongs to the itli class and the jth 
group, and appears in the ith column and thejth row. 

The resolution of the sum of products expressed in (17) is still 
valid. But the first sum in the second member of the equation may 
be further resolved. Thus if 



and we apply the resolution expressed by (17) to the sum of pro- 
ducts 2 S^i;^j, making it however with respect to the means of 

i J 

rows instead of the means of columns, we find by the same argument 
as in 100 that 

S Cfy - *) (vu - y) = S k(*i - x) (Hi - y) + S M*j - *) $1 - y) 



i j 

Denoting the various sums in this equation by C", C f 1> C 2 , C 1 we may 
write it briefly 



On the assumption that the pairs of values constitute a random 
sample from a homogeneous population, the expected value of the 
first member of (25) is (N - 1)/J U , where fi u is the covariance in the 
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population. That of the first sum in the second member has beefi 
shown to be (A-l)/4 n , and similarly that of the second sum is 
(k 1 )/* n . Consequently the expectation of the final sum is 



On dividing the various sums by (N 1), (h 1), (& 1) and 
(h 1) (k 1) respectively, we thus obtain four unbiased estimates 
of the co variance of the homogeneous population, with D.F. repre- 
sented by these divisors. In 100 we obtained the corresponding 
estimates of the variance of #, and similar equations give estimates 
of the variance of y in the population. From the partial sums we 
obtain three independent estimates of the coefficients of regression 
and correlation. Denoting the sums of squares of the X'B by A", A ly 
A 2 , A' and those of the t/'s by B", B v J5 2 , B f , we may tabulate 
the results: 



Source of 
variation 


D.F. 


Sums of 
squares 


Sum of 
pro- 
ducts 


Coefficient of 
regression 


(* 2 ) 


(y 2 ) 


Between classes 


h-l 


A 


*i 


Oi 


*i = C'iMi 


Between groups 


fc-1 


^i 


#2 


c* 


&2 = C 2/ A * 


Error 


(h-l}(k-l) 


A' 


B' 


C' 


6' = C'/A' 


Total 


hk-l 


A" 


B" 


C" 






The first step is to test the significance of the regression coefficient 
&' obtained from the error sums; and this is done exactly as in 107, 
except that the number of D.F. for error is here (h 1) (k 1). Thus 
the residual sum of squares due to error is B r C' 2 /A ', with N~h k 
D.F., giving the estimate of variance of y 



With this we compare the estimate C'*/A' obtained from the regres- 
sion sum of squares corresponding to 1 D.F., and the result settles 
whether b' is significant. If it is, we proceed to test the significance of 
the differences of the means of y in classes (or groups) after these have 
been corrected for regression on x. This may be done as follows. Take 
the first and third rows of the above table, and by addition form a 
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combined source of variation, S = (classes H- error), with sums of 
squares and products A, J5, (7. Thus we have: 



Source of 
variation 


D.F. 


Sums of squares 
and products 


Classes 
Error 


fc-1 

N-h-k+l 


A 19 B 19 Cj, 
A', B' t C' 


Total, S 


N-k 


A, B, O 



where A = A l + A', etc. From these we form the corresponding 
residual sums of squares as in 107, viz. 



Source of 
variation 


D.F. 


Residual sum 
of squares 


Classes 


h-2 


-Bi-Cf/^i 


Error 


N-h-k 


B'-C^IA' 


S 


N-k-l 


B-C*/A 



Subtracting the second row from the third we obtain a residual 
sum of squares B l + C' 2 IA' C 2 IA corresponding to A 1 D.F. 
Comparing the estimate of variance obtained from this with the 
estimate s* we are able to decide whether the class means differ 
significantly after correction for regression. 

Incidentally, we may deduce a test for the significance of the 
difference 6 X 6'. For, on subtracting the residual sums of squares 
for classes and error from that for S, we have the sum 



with 1 D.F. As in 107 this sum is expressible as 



so that in testing it we are testing the difference b l 6'. Comparing 
it with the residual sum of squares for error we have a variance ratio 



Ji _ 



N-h-k 

x =/>- 



(26) 



B'-C'*IA" 
1^ = 1, v 2 = N h k. 

We thus determine whether b l is significantly different from 6'; 
and we may test 6 2 by a similar formula.* 

* For a numerical illustration see Ex. xi, 15, below. 
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EXAMPLES XI 

1 . Verify as follows that the expected value of the final sum in 
97 (3) is (h l)cr 2 . If fi is the mean of the population, 



i i 

Since E(x i ft) 2 is the variance of the mean of a sample of n, t members, 
and E(x fi) 2 is that of the mean of a sample of N members, it follows 
that r- -, 

E\ 2>;fo-) 2 = E^(cr 2 K)-AV 2 /^ = (&-l)(7 2 . 

Verify the mean value of S S ( x ii %iY similarly. 

i ) 

2. With the notation of 100, T if = Xy Xt Xj + x, show that 
2 Yy = = 2 Y^. Thus the sum of the Y' s in any row or in any 

i J 

column is zero. Only (A 1)(&-~1) of the quantities Y^ are in- 
dependent. 

3. To test the significance of the variation of the retail price of a 
certain commodity among four large cities, A, B, C and D, seven 
shops were chosen at random in each city, and the prices observed 
were as follows: 

A: 6s. 10d., 65. Id., 6s. Id., 5s. 9d., 5s. 9d., 5s. 3d!., 5s. Id. 

JR: Is., Qs. 10d, 6s. 8d., 6s. Id., 6s. U., 5s. 8d., 5s. 2d. 

C: Is. 4d., Is., 6s. 8d., 5s. Sd., 5s. Sd., 5s. 6d., 5s. 6d. 

D: 6s. Id., 6s. 5d. 9 6s. U., 6s. 2d., 6s., 5s. Sd., 5s. U. 

Do the data indicate that the prices for the four cities are signi- 
ficantly different? 
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The classes correspond to the cities. Expressing the prices in 
pence show that the tabulated results are : 



Source of variation 


D.F. 


Sum of squares 


Mean square 


Between cities 
Within cities 


3 
24 


94-96 
1446-00 


31-65 
60-25 


Total 


27 


1540-96 






Thus the mean square between cities is less than that within cities, 
and is therefore not significantly large. Neither is it significantly 
small because, for v^ = 24 and v% = 3, a value of F in the neighbour- 
hood of 2 is not significant. 

Show that the S.E. of the difference of the means for two cities is 
4- 1 5d. ; and hence for no two cities is the difference of the mean prices 
significant. 

4. Show that, if o^ and x 2 are the means of two classes of n^ and 
n 2 members respectively, ands 2 is the estimate of population variance 
derived from error and corresponding to v D.F., then s^(l/n l + l/n 2 ) 
is the S.E. of the difference of the means of the two classes, and 
hence that the statistic 



is distributed like t for v D .F. This formula provides a method of testing 
the significance of the difference of the means of two unequal classes. 

5. In a certain large country, to test the variation in the price 
of a certain commodity with district and season, six districts were 
chosen at random, and the price was observed in each on six random 
occasions throughout the year. The observations are recordedin pence 
in the table below, in which columns correspond to seasons and rows 
to districts. Discuss the significance of the variation \vith season and 
with district. 



11 

12 
10 
9 
13 
14 


15 
16 
13 
14 
16 
17 


24 
25 
21 
18 
20 
23 


19 
18 
17 
18 
16 
20 


21 
22 
20 
21 
23 
22 


17 
16 
15 
13 
14 
15 



288 Analysis of Variance [xi 

Show that analysis of variance leads to the following results: 



Source of 
variation 


D.F. 


Sum of 
squares 


Mean square 


F 


Seasons 


5 


492-33 


98-46 


55-5** 


Districts 


5 


44-33 


8-86 


5-0** 


Error 


25 


44-33 


1-77 





Total 


35 


581-00 


_ 






Both the variance ratios are highly significant. Show that the S.E. 
of the difference of the means for two seasons or two districts is 
0-77; and deduce that the prices in the first, second and last districts 
do not differ significantly. 

6. An agricultural experiment, on the Latin square plan, gave 
the following results for the yield of wheat per acre, letters corre- 
sponding to varieties, columns to treatments and rows to blocks. 
Discuss the variation of yield with each of the factors: 



A 


16 


B 


10 


G 


11 


D 


9 


E 


9 


E 


10 


G 


9 


A 


14 


B 


12 


D 


11 


B 


15 


D 


8 


E 


8 


C 


10 


A 


18 


D 


12 


E 


6 


B 


13 


A 


13 


G 


12 


C 


13 


A 


11 


D 


10 


E 


7 


B 


14 



Show that the results obtained by analysis of variance are: 



Source of 
variation 


D.F. 


Sum of 
squares 


Mean square 


F 


Treatments 


4 


66-56 


16-64 


37-8** 


Blocks 


4 


2-16 


0-54 


1-2 


Varieties 


4 


122-56 


30-64 


69-6** 


Error 


12 


6-28 


0-44 





Total 


24 


196-56 









Show also that the S.E. of the difference of the means of two 
classes is O42; and hence that for no two blocks do the means differ 
significantly. 
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7. A random sample of 1 00 pairs of values from a normal popula- 
tion, grouped in 10 arrays of T/'S, gave a correlation ratio of y on x 
equal to 0*3. Show that this value is not significant of association 
between the variables. 

8. A random sample of 1 50 pairs from a bivariate normal popula- 
tion when grouped in 15 arrays of y's gave values r = 04 and 
y y = 0-5. Show that these results are consistent with the assump- 
tion of linearity of regression of y on x. 

9. A parabola, fitted to a random sample of 45 pairs of values 
from a normal population, gave an index of correlation R = 0-3. 
Show that this value is not significant of parabolic regression. Also 
show that it would not be significant for a sample of less than 
83 pairs. 

10. Give a direct proof that the expected value of the last sum 
in105(17)is(A-l)/i u . 

1 1 . Give a direct proof of the formula 106 (21) for C. 

12. In the example of 107, show that the sum of squares repre- 
sented by B C 2 IA , which corresponds to 4 D.F., is highly significant. 
Hence the deviations of the means of districts from the line of 
regression fitted to them are too great to be attributed to chance. 

13. Supply the details of the proof of 108 (25). 

14. With the notation of 107 the difference of the means of the 
?/'s for the pth and qtli classes, when corrected for regression, is 

y p -y q -1>'(x p -x q ) (i) 

which consists of two independent parts. The estimated variance of 
the first part is 2s 2 /k, where k is the number n t in each class, and 
s 2 is the error mean square (B' C' 2 /A')/(N h 1) in the analysis 
of residual variance. The estimated variance of b' is s 2 /A'. Hence 
show that the estimated variance of the difference (i) is 

#[2lk + (x p -x q )*IA']. (ii) 

The quotient of the difference (i) by the square root of the expression 
(ii) is distributed like t with JV-A-1 D.F. (Cf. Wishart, 1936, 2; 
also Wishart and Sanders, 1936, 3, pp. 53-4.) 
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15. Examine, as in 108, the covariation between the yields of 
grain and straw, as indicated by the following data.* Each of 5 
blocks was divided into 5 plots, and 5 different treatments ( A , B, . . . , 
E) were distributed at random among the plots of each block. The 
columns correspond to blocks, and the yields of grain and straw are 
denoted by x and y respectively: 





X 


y 


X 


y 


X 


y 


X 


y 


X 


y 


A: 


65 


32 


68 


26 


65 


32 


71 


26 


62 


33 


B: 


75 


38 


54 


20 


71 


32 


71 


28 


64 


29 


C: 


72 


33 


69 


30 


69 


38 


69 


30 


61 


30 


D: 


70 


29 


69 


27 


72 


26 


70 


24 


70 


31 


E: 


70 


37 


67 


29 


52 


33 


66 


23 


66 


40 



The factors of classification are blocks and treatments. Diminish- 
ing all the x's by 66 and all the j/'s by 26, verify that 

JJ = 22, -3, -1, 17, -7; T = 28 
y;=39, 2, 31, 1, 33; T' = 106, 

and calculate the values of Tj and Tj. Hence show that the sums of 
squares and products are given by : 



Source of 
variation 


D.F. 


Sura of squares 


Sum of 
products 


Coefficient 
of regression 


(**) 


(2/ 2 ) 


Blocks 


4 


135-04 


265-76 


2-68 


0-020 


Treatments 


4 


98-24 


87-36 


-64-12 


-0-653 


Error 


10 


455-36 


211-44 


174-72 


0-384 


Total 


24 


688-64 


564-56 


113-28 






The estimate s 2 obtained from the residual sum of squares for 
error is ^ ^ B'-C'^A' = 144-34 = ^ 

N-h~-k 15 

To test the significance of b' we compare with s 2 the estimate 
C' 2 /A' = 67 1 corresponding to 1 D.F. The variance ratio is 

F = 67-1/9-62 = 6-9 (^ = 1, v 2 = 15). 

* Data adapted from Fisher and Eden, Journ. Agric. Sci., vol. 17 (1927), 
p. 548. 
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Thus the value of V is decidedly significant of positive correlation 
between x and y, apart from the factors of classification. To test 
whether the yield of straw, corrected for yield of grain, varies 
significantly with the treatment, we calculate 



C = G's+C 7 ' = 110-6, A = AI + A' = 553-6 

and compare the estimate of variance (B 2 + C' 2 /A' -C 2 IA)/(k- 1) 
based on 4 D.F. with s 2 based on 15. The former has the value 

(87-36 + 67-1 -22-l)/4 = 33-09, 
and the latter is 9-62. The variance ratio is 
F = 33-09/9-62 - 3-44 

which, for v l = 4 and v 2 = 15, is significant at the 5 % level. We 
conclude that the yield of straw, after correction for yield of grain, 
does vary significantly with the treatment. 

Show that the difference b'-b 2 , tested by 108 (26), is decidedly 
significant. 



WMS 16 



CHAPTER XII 

MULTIVARIATE DISTRIBUTIONS. 
PARTIAL AND MULTIPLE CORRELATIONS 

109, Introductory, Yule's notation 

In considering a bivariate distribution we saw that, when it is 
known that the values of one variable are influenced by those of 
another, the coefficient of correlation provides a useful measure of 
the degree of association between them. But it often happens that 
the values of a variable are influenced by those of several others. 
It is known, for instance, that the statures of men are influenced 
by those of their ancestors; and the yield of grain is affected by the 
amounts of different fertilizers used. In such cases our data usually 
constitute a distribution of values of several variables. If we are 
concerned with the combined influence of a group of variables upon 
a variable not included in that group, our study is that of multiple 
regression and multiple correlation. If, however, we wish to examine 
the effect of one variable upon a second, after eliminating the effects 
of other variables, our problem is that of partial correlation. The 
analysis involved is rendered simple and compact by a notation due 
to Yule,* which has gained a fairly wide acceptance in recent years. 
Before applying this to the more general case of several variables, 
we shall pave the way for the student by reviewing the results 
obtained for two variables in Chapter iv, and expressing them in 
Yule's notation. 

Let the two variables, measured from their means, be x l and # 2 , 
with standard deviations cr l and o" 2 . If the lines of regression of 
x l on # 2 , and of # 2 on x v are 

x i = b n x *> X 2 = &2i x i> t 1 ) 

respectively, the residuals, or errors of estimate of the variables 
incurred by using these equations, are expressed by 



* Yule, 1907, 1. 
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These residuals are the deviations of the representative points from 
the corresponding line of regression. The values of the coefficients 
of regression, 6 12 and 6 21 , are obtained by minimizing the sums of 
squares of the residuals; and this is done by equating to zero the 
partial derivatives of these sums with respect to 6 12 and 6 21 , an d the 
constant terms. This leads to the normal equations 

(*i-&i 2 * 2 ) = 0, S*a(*i- 612*2) = (3) 

in the one case, and 

S (z 2 - 6^) = 0, S x^x 2 - 6 2ia?1 ) - (3') 

in the other. The first equation in eacli case expresses the fact that 
the mean of the distribution is on the line of regression; and, with 
the mean as origin, the equations of these lines have the simple 
form (1). The above normal equations, expressed in the more 
compact notation, are 

S&i. a = 0, IXa^^O (3) 

and Z*2.i = 0, S *!*,.! = <), (3') 

respectively, the summation including all pairs of values of the 
distribution. It will be observed that, whereas in Chapter iv 
subscripts were used to distinguish the pairs of values in the distribu- 
tion, they are here used to distinguish the different variables. They 
are no longer needed for the former purpose, since 2 will throughout 
denote summation over the whole distribution. 

From (3) and (3') we have the familiar values of the coefficients 
of regression, 

6 21 = 2*1*2/2*?. 



The coefficient of correlation, which we denote by r 12 or r 21 , is 
such that 

rl* = 6 u 6 a i f (4) 

and 6 12 = r^o-Jo-^ 6 al = r 2l (r^cr l9 (5) 

r 12 having the same sign as 6 12 or 6 21 , which is the sign of 2 #i# 2 . It 
should be remembered that, although r l2 = r 21 , the values of 6 12 
and 6 21 are in general different. Lastly the mean squares of the 

16-2 
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deviations (2) are denoted by erf 2 and v\ tl respectively. In virtue 
of 26 (21) and (23), these are given by 



(6) 



The quantities cr li2 and cr 2 x are referred to as standard deviations of 
the first order, the order being the number of subscripts following the 
point. These are called secondary subscripts, while primary subscripts 
are those preceding thepoint. The standard deviations cr l andcr 2 are of 
zero order. The residuals x l 2 and x 2 x are deviations of the first order. 
The regression of x t on x 2 is said to be linear, if the mean of each 
array of x^s is on the line of regression. 

110. Distribution of three or more variables 

Consider next a distribution of three variables which, measured 
from their means, are x l9 x% and # 3 . The data consist of N sets of 
corresponding values of the three variables. We enquire first as to 
the best estimate of x l that can be obtained from the data in the 
form of a linear function of x 2 and # 3 . The regression equation is 
thus of the form 7 L 

X l = a -f0 12 .3# 2 + 013.2^3' 

The 'best' estimate is interpreted in accordance with the principle 
of least squares, the constants being chosen so as to make the sum 
of the squares of the errors of estimate a minimum. The subscripts 
to the coefficients are written down on the following principle. The 
first is that of the variable for which the estimate is being found, 
and the second is that of the variable which the coefficient multiplies. 
These are primary subscripts. Separated from them by a point are 
the subscripts of the other variables that enter into the equation. 
These are secondary subscripts, and their number determines the 
order of the regression coefficient. In the above equation the 
coefficients are partial regression coefficients of the first order. The 
significance of the term * partial' will be explained shortly. The 
constant a has been given no subscripts, because it will appear 
immediately that its value is zero. 
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The sum of the squares of the residuals to be minimized is 

(#i fl ^12. 3 ^2^^13.2^3) 

Equating to zero its partial derivatives with respect to a and the 
6's we have the normal equations 

Sfci -a -612.3*2 -613.2*3) = 
2^(^i~a-6 12 . 3 a; 2 -6 1 3. 2 a: 3 ) = 0, 

S*3(*i- *- 612.3^2 - 6 i3 .2*3) = - 

Since the mean of each variable is zero, the first of these equations 
gives a = 0; and the regression equation of x l on x 2 and # 3 is simply 

Sl = &i 2 .3*2 + 6 13.2*3- ( 7 ) 

The other two normal equations may then be written more con- 

PlSfMV 

2 #2*1. 23 = 2#3#1.23> w) 

in which # 1>23 , defined by 

*1.23 = *l~~^12.3*2~~013.2*3> (9) 

is the residua], or error of estimate of x l from the regression equation 
(7). Its mean is zero, since those of the other variables are zero. 
Hence the S.D. of this residual, denoted by cr l 23 , is given by 



the summation covering the whole distribution, whose total 
frequency is N. This is a S.D. of order two, since it has two secondary 
subscripts. 

Similarly, we have the regression equation of x 2 on x l and # 3 , 
namely, , , 

X 2 = ^l.S^l + 6 23.1^3 

for which the error of estimate is 



and the normal equations 

2^1*2.13 ~ ^ ~ 2 ^3*^2. 13' 

There are also similar equations for the regression of x 3 on x l and x 2 
In general the coefficients 6 12 . 3 and 6 21t8 are different. 
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The normal equations (3), (8) and (13) express that the sum of the 
products of corresponding values of a variable and a residual is zero, 
when the subscript of the variable is included among the secondary 
subscripts of the residual, the summation covering the whole dis- 
tribution. Further, if x lt2 is the residual defined by (2), we have 

2^1.23^1.2 ~ ^Li X l.23\ X l~~b 1 2X 2 ) = 2^1.23^1* 

Similarly, 

2^1. 23^1. 23 = 2 ^1. 23(^1 ~~ ^12. 3 2-2 ~ ^13. 2 ^3) ~ L^]. 23*^1 

in virtue of (8). Thus it is evident that the sum of the products of two 
residuals is unaltered by omitting from one of the factors any secondary 
subscripts which are common to both. In virtue of this result and the 
normal equations it follows that the sum of the products oftivo residuals 
is zero if all the subscripts of the one are included among the secondary 
subscripts of the other. 

The argument of this section is also applicable to the case of n 
variables x lt # 2 , . . . , x n . In the regression equation of x l on # 2 , . . . , x n , 
which corresponds to (7), the coefficients have each n 2 secondary 
subscripts, and are therefore of that order. The residual #i. 2 3...n 
corresponding to (9) is of order n 1, as is also its S.D. ^ 23 _ n . 
The normal equations are 



And the theorem concerning the omission of common secondary 
subscripts is equally valid in the case of n variables. 

Ill . Determination of the coefficients of regression 

The regression equation (7) may be interpreted as representing a 
plane, called the plane of regression of x l on x 2 and # 3 . The normal 
equations (8) may be expressed 



where r 12 is the correlation between x l and # 2 , obtained by ignoring 
the values of x 3 . This is called the total correlation of x 1 and # 2 . 
Similarly, r 23 is the total correlation of # 2 and x 3 , and so on. These 
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coefficients are symmetrical in the subscripts. We may eliminate 
the 6's between (7) and (14) by equating to zero the determinant of 
their coefficients and the remaining terms. Dividing the second 
and third rows of this determinant by cr 2 and <r 3 respectively, and 
then the first, second and third columns by <r v <r 2 and cr 3 respec- 
tively, we obtain the regression equation of x on the other variables 
in the form 



= 0. 



If then a) is used to denote the determinant 



(15) 



0) = 



'21 '31 



'12 



1 



'32 
1 



(16) 



r 13 '23 

and (t)^ is the cofactor of the element in the ith column and the 
jth row, we may write (15) as 



x 

** 



(17) 



From this form of the regression equation it follows that 



3 2 



The regression of x^ on # 2 and .r 3 is said to be linear if the mean of 
each array of x^s lies on the plane of regression (17). 

The residual x l 23 may be looked upon as the deviation of the 
representative point from the plane of regression. The term deviation 
is therefore often used in place of residual. The variance crj t23 of 
x l 23 may be expressed in terms of &>, o) n and o^. For 



This is equivalent to 
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and, on eliminating the &'s between this equation and (14), we have 
a result which may be expressed 



31 



= 0, 



or 



'-"ii^las/ol^O- 



Consequently 



r i.23 ( 



(19) 



The above argument clearly holds for the more general case of 
n variables. In place of (16) we then have a determinant oj of order 
n ; and the regression equation of x l on # 2 , ..., x n is 



(20) 



From this we obtain, as in (18), the regression coefficients of order 
Ti 2. In place of (19) we have an equation giving the variance of 
the deviation x l 23 n , namely, 

Example. Let x l9 x 2 , x z in. be the excesses of the heights of father, mother 
and son respectively above their mean values. A distribution of these variables 
gave the following approximate correlations and standard deviations:* 

r 12 = 0-28, r 23 = 0-49, r 81 = 0-51, 
<r l = 2-7, cr 2 = 2-4, cr 3 = 2-7. 
Show that the regression equation of # 3 on x l and # a is 

x z - 0:40^ + 0-42^ 2 , 

and deduce that, if the mean heights of father, mother and son are 67-68, 
62-48 and 68-65 in. respectively, the regression equation for actual heights 
X 19 X X 3 is 



Also show that cr 3 12 = 21. 



* Modified data from Pearson and Lee, Biometrika, vol. 2 (1903), pp. 
357-462. 
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112. Multiple correlation 

We proceed to find the correlation between the variable x l and 
its estimate from the regression equation (7). This correlation, 
which is an indication of the agreement between x l and its estimate, 
is called the coefficient of multiple correlation between x : and the two 
variables x 2 and x 3 , and is denoted* by -Ri( 23 ). It may be determined 
as follows. If e 1>23 is the estimate of x^ given by (7), we have 

^1.23 = ^12.3^2 + ^13.2^3 

or i.23 = *i-*i.2s- (22) 

The mean value of e l 23 is zero, since those of x l and x l 23 are both 
zero. The sum of the products of x l and e li23 is 



Also 2X.23 = 2>i-*i. 23 ) 2 = S*!-Sx!.M 

= jV(crf-(rf. 23 ). 

Consequently the coefficient of correlation between x t and e 1 23 is 
given by 

_ ^i^^i.as ^ //_. 2 -2 \ 

^1(23) = ^rn/ZaZ^ i = "^(^1-^1.23). 
^iVv^i ^i^s/ a i 

This is the required coefficient of multiple correlation. The result 
may be expressed 

<23 = <r!(l-12; (M) ), (23) 

which is analogous to (6) and to 41 (46). Comparing (23) and (19) 
we see that 



1(23) 



whence -R?^ = 1 w/w n = 1 ~6;/(l r| 3 ) 

'"-''-"-irs!, (25) 



r 

'23 



which expresses the multiple correlation in terms of the total 
correlations between the pairs of variables. 

* Some writers prefer the notation 7? t 28 . 
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We may note that .R^a) is never negative. For, from the above 
argument, 2^i e i.23 = H 6 !.23> which cannot be negative. Further, 
when ^(23) = 1 it follows from (23) that <r\ 23 = 0, which requires 
all the deviations x 03 to be zero, so that x l is given accurately by 
the regression equation. In this case x l is a linear function of x 2 
and x s . 

The argument is also valid for a distribution of n variables. The 
multiple correlation #K 23 _ n ) is the correlation between x l and its 
estimate ej.23.,.n from the regression equation (20); and the argu- 
ment leads to the corresponding formula 



and l--Ri<23...n) = w Mi- ( 24/ ) 

If the multiple correlation is equal to unity, x l is a linear function 
ofx 2 ,...,x n . 

Example 1. Prove the following relations: 

2zi.23 e i.23 = 0, 

V ~2 V r 2 ;, V r r , ;, V r r 

^j^^ *^t t'l _ 23 i~ t '12 .3 ^ ^I 1 * 2 ~i~ ^13 2-*^' l^S' 



Example 2. Show that, for the example in the preceding section, 

#3(12) =0-63. 

113. Partial correlation 

Consider next the correlation between the deviations x l 3 and 
x 2 3 . Since x l 3 is the deviation of x 1 from its estimate in terms of 
x 3 , we may regard it as that part of the variable x l which remains 
after the influence of x. 3 has been eliminated, as far as can be done by 
a linear equation. A similar interpretation can be given to x 2 3 . 
Hence the correlation between these deviations may be looked upon 
as the correlation between x l and x, z after the influence of # 3 has been 
eliminated. We denote this correlation by r 12 3 , and call it the partial 
correlation between x l and x 2 in the trivariate distribution. Having 
one secondary subscript r 12 3 is a partial correlation of the first 
order. It will be remembered that, in calculating the total correla- 
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tion r 12 , the values of # 3 are simply ignored. It is therefore a correla- 
tion calculated on the assumption that the variables x and x 2 are 
influenced only by each other, and not by any other variable. 

To find the partial correlation r 12 3 we observe that, by the 
theorems of 110, 



2.3* 



= 



SO that 6 12 . 3 = S *1. 3 *2.3/2>I.3' ( 26 ) 

From this result it follows that 6 12 3 is the coefficient of regression 
of x l 3 on #2.3- Similarly, 6 21 3 is the coefficient of regression of # 2 3 
on x l 3 ; and the coefficient of correlation between these deviations 
is therefore given by 2 , . , 

M2.3 " y !2.3 f; 21.3- \* ' ) 



Since cr l 3 and a* 2 3 are the standard deviations of x l 3 and cc 2 .3> 
coefficient of partial correlation is connected with the coefficient of 
partial regression by the usual formulae 



1. 3' 



And it is now clear why the above 6's are called coefficients of partial 
regression. The correlation r 12 3 has the same sign as 6 12 3 , which is 
the sign of w 12 by (18). Also, in virtue of (27), r 12 3 is symmetrical 
in the primary subscripts. If the values of the 6's are substituted 
from (18) in (27) we obtain 



so that r - ~ 12 - i2^i323 /oqx 

so that r 12 , - _ . (29) 



Similar formulae may be written down for r 13 2 and ?* 23>1 . The partial 
correlations of the first order are thus expressible in terms of the 
total correlation coefficients. 

The partial correlation between x l and x 2 in the trivariate dis- 
tribution is sometimes defined as the correlation between x l and x 2 
for a constant value of # 3 . In general, however, this correlation will 
depend upon the constant value of # 8 selected. In certain special 
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cases the second definition agrees with the first, and the result is 
then the same for all constant values of # 3 . Necessary and sufficient 
conditions* for this agreement may be stated: 

(a) In the bivariate distribution of x and x 3 (x 2 being ignored), 
the regression of x l on # 3 must be linear, and the standard de- 
viations of all the x l arrays (# 3 == const.) must be equal. 

(b) In the trivariate distribution the regression of x l on x 2 and 
# 3 must be linear, and the standard deviations of all the x arrays 
(x 2 const., #3 = const.) must be equal. 

These conditions are satisfied in the normal trivariate distribution 
to be considered in 116. 

For a distribution of n variables there are partial correlations of 
all orders from 1 to n 2. Thus if (k) denotes a definite group of 
secondary subscripts not including i and j, the correlation between 
the deviations x { (k) and x^^ denoted by %.(/c)> is a partial correla- 
tion of order equal to the number m of subscripts in (k). It is con- 
nected with the regression coefficients for these deviations, and their 
standard deviations o\- ^ and cr ; . M , by formulae analogous to (27 ) and 
(28), namely, 2 __ A A nm 

r ij.(k) "~ ij.(k)ji.(k)> W u / 

and bj* HA = r 



If (k) includes all the subscripts but i andj, we find as in (29) 



the symbols on the right being cofactors in the determinant a) of 
order n. 

A partial correlation of order m 4- 1 is expressible in terms of 
those of order m by an equation of the same form as (29), with a set 
(k) of secondary subscripts added to each coefficient in the formula. 
Thus 



where h, i, j are unequal. 

Example. Prove that, for the example in 111, 

r sl 2 = 0-445, r aa>1 = 0-42. 
* For a proof see Camp, 1934, 1, pp. 341-2. 
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1-14. Reduction formula for the order of a standard deviation 

A S.D. of any order may be expressed in terms of a S.D. and a 
correlation coefficient of lower order. Thus, since 

S^l.23 = S^l^V 2 '!"" ^12. 3^2 """^13. 2^3) 
= 2 #1.2 ~~^13.2 2 #1.2^3. 2> 

we have, on dividing by N and using (26) with subscripts inter- 
changed, ^ = ^ t(l _ bu j n j ^.(l-rf,.,), (33) 

which is of the same form as (6). Since a\ 23 is symmetrical in the 
secondary subscripts, the subscripts 2 and 3 may be interchanged 
in the second member of (33). Substituting the value of erf 2 given 
by (6) we may also write the result 



Similar formulae may be written down by cy clic permutation of the 
subscripts. The equations (33) and (34) show how a S.D. may be 
expressed in terms of one of lower order, and one or more of the 
correlation coefficients. Also from (33) it follows that 0*1. 23^0*1. 2* 
The estimate of x l from x 2 and # 3 is thus in general better than the 
estimate from # 2 alone, being just as good only if r 13 2 is zero. 
Further, on comparing (34) with (23) we see that 

1-^(23)=(1-^12)(1-^13. 2 ), (35) 

which is in agreement with the values already found for ^(23) and 
r 13 2 in terms of the total correlations. From (35) it is clear that 

1 ~" ^1(23) ^ * ~~ r i2> 
so that ^1(23) ^ r i2- 

Since ^(23) is symmetrical in the subscripts 2 and 3, it follows that 
this coefficient of multiple correlation is not numerically less than 
either r 12 or r ls . If then ^ 1(23) is zero, both r 12 and r 13 must be zero, 
and x l is then uncorrelated with either x 2 or # 3 . 

The same argument shows that, in the case of n variables, the 
equation (33) has the generalization 

^ (Jfc) ), (36) 
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where ( k) denotes as usual a group of secondary subscripts. Repeated 
application of this formula leads to a generalization of (34), namely, 

<rl...n=<rW-rl)(l-r\ a .J(l-rl. a )...(l-r^..^). (37) 

Comparing this with (23') we see that 

l-^(23...)=(l-'-f 2 )(l-^. 2 )-(l-^. 2 3...^l). (38) 

If -fii(23... w ) = each of the correlations in the second member is 
zero, and also each of the coefficients r 12 , r 13 , ..., r ln . Thus x l is then 
uncorrelated with any of the other variables. 

115. Reduction formula for the order of a regression coefficient 

To express a coefficient of regression in terms of coefficients of 
lower order we may proceed as follows. First 

2 #1.3^2. 3 S ^l(^2 ~~ ^23*^3/' 

Then, since 6 23 = b^crl/cr^ we may write the above equation, in 
virtue of (26), 2 , , 2 , 2 ; 2/ 

^1.3*12.3 = &12 cr l- 6 13'I- 6 3a or 2/^3> 

or, by means of (6), 

01(l-rl 3 )& 12 . 8 = erl(& ia -& 13 & 32 ). 
Thus we have the required reduction formula 



This may be expressed in terms of correlations. For, in virtue of 
(28), it is equivalent to 



Substituting the values of cr l 3 and cr 2 3 given by equations of the 
form (6), we find (29) again. 

In the case of n variables, by similar reasoning, we arrive at the 
generalization of (39), 



2. ate) 



where (k) has the usual significance. 
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116. Normal distribution 

A generalization of the bivariate normal distribution to the case 
of three variables may be obtained as follows.* First suppose that 
the variables x 2 and # 3 are normally distributed about zero means 
with correlation r 23 . Then the probability that a pair of values chosen 
at random will fall in the interval dx 2 dx z is 



__ _ 

,> u P L 2a> u Us *l 

where a) n = 1 rfa- Next assume that the regression of x l on # 2 
and # 3 is linear, and that in each x array the variable is normally 
distributed with S.D. which is the same for each of these arrays. 
Then, since the mean of each array is on the plane of regression (7), 
the S.D. of each array is cr 1 23 , given by 

0-^23 = crlo)/o) n . (41) 

Consequently the probability that x l9 chosen at random in an 
assigned array, will fall in the interval dx l is 

dx 





exp " 



1.23 \ 



r * L*I 

L_ 11 \ 1 



by (41) and (17). On forming the product dP l dP 2 we have the 
probability that a set of values of the three variables, chosen at 
random, will fall in the interval dx l dx 2 dx z> as 

, , , v t % 

exp ( - ^), (42) 

3 



where, after a little reduction, it will be found that 

. JL I Xl Xn JCr* XnXn XnX-t 



3/-| Xn \ 

l2 -l-? . 
12/ 

(43) 

The symmetry of (42) and (43) in the subscripts shows that the 
properties of the three variables are similar. Thus all the regressions 

* Cf. Rietz, 1927, 2, pp. 106-7. 
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are linear. The variance of each array of x 2 's in the trivariate 
distribution is wcr|/a> 22 = erf 18 , and that of each array of a^'s is 

wilMa = 01 12- 

We may verify the statement made in 113 that, in the present 

case, the correlation between x and x 2 for a constant value of x 3 
is the partial correlation r 12 3 . For may be expressed in the form 



a>, 



22 



I ""^23 ^a) M"^2' 
J ^3 



(44) 



and, on comparing (42) with 38(28), we see that, for a constant 
value of # 3 , x l and x 2 are normally distributed about means 6 13 # 3 
and 6 23 # 3 respectively, with correlation 

^12 _ r 

__ - '12 3 

as stated. From (42) and (44) it is also clear that the deviations 
# 1>3 and # 23 are normally distributed with correlation /* 12 . 3 . 
The above results may be extended to the case of n variables.* 

117. Significance of an observed partial correlation 

Fisher has proved that the sampling distribution of a partial 
correlation coefficient! of order k, in samples from a normal popula- 
tion, is of the same form as that of the correlation coefficient for 
samples from a bivariate normal population, with the sample 
number N reduced by k. In particular, when the partial correlation 
in the population is zero, the square of the partial correlation 
coefficient, r, in samples of N sets of values, is a /^(J, %(N k 2)) 
variate. By the argument used in 89 it then follows that the 
statistic i defined by 

fc-2), (45) 



conforms to the i distribution for (N k 2) D.F. We are thus able 
to test the significance of an observed partial correlation. 

* Cf. Yule and Kendall, 1937, 1, pp. 282-4. f Fisher, 1924, 4. 
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Example 1. From a random sample of 21 sets of values from a normal 
population the calculated value of a partial correlation of order three is 0-40. 
Is this consistent with the assumption that the corresponding partial corre- 
lation in the population is zero ? 

In applying the above test to our assumption we have v = 21 3 2=16, 
and 



= 
V(0-84) 0-916 

This value of t is not significant at the 5 % level, so that the observed correla- 
tion is not significant of correlation in the population. 

From the sampling distribution it follows, as in the case of two 
variables, that if Fisher's z transformation of 94 is applied to the 
above partial correlation of order k, the statistic z is distributed 
nearly normally with variance 1/(N fc 3). We are thus able to 
test if an observed partial correlation differs significantly from some 
assumed value. Similarly, we can test the significance of the 
difference of the observed partial correlations in two independent 
samples. 

Example 2. From independent samples, of 32 and 23 sets of values, 
partial correlations of order four are found to be 0-4 and 0-6 respectively. 
Examine (i) whether the first value is consistent with the assumption of a 
normal population with a corresponding correlation of 0-7, and (ii) whether 
the two samples may be regarded as from the same normal population. 

(i) From the Table 7 we find that the values r l = 0-4 and r = 07 corre- 
spond to z = 0-424 and z = 0-868. The deviation of z is therefore 0-444. 
The S.E. of z l is 1/^(32 43) = 0-2. Since the deviation of z l is greater than 
twice the S.E. it is significant. The assumption of a correlation of 0'7 in the 
population is thus ruled out. 

(ii) Corresponding to r l = 0-4 and r 2 = 0*6 we have z l 0-424 and 
z 2 = 0-693, giving z 2 z i = 0'2? nearly. The S.E. of the difference of the z's 
is given by 

) = V(<>1025) = 0-32. 



The difference z 2 z l9 being less than e, is not significant. The samples may 
thus be regarded as from the same population. 

118. Significance of an observed multiple correlation 

Consider next the significance of an observed multiple correlation 
coefficient, R, of the variable x l with the p variables x 2 , # 3 , ..., x p+l9 
calculated from a random sample of N sets of values from a multi- 
variate normal population. Fisher has found the general sampling 
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distribution* of jR, and has shown that it depends, not on the wholtf 
matrix of correlations between the variables, but simply on the 
multiple correlation in the population and the sample size, N. 
In particular, when the multiple correlation coefficient in the 
population is zero,| R* is a fi^p, %(N -p-l)) variate. In virtue 
of Theorem vn of 72 it follows that, in this case, R 2 /(l-R 2 ) is a 
&(&? %( N -P - *) ) variate ; and therefore, by the argument of 92, 
the statistic F defined by 

-rS/T J - 

conforms to the F distribution for 

"i=p, v^N-p-\. (47) 

To test the hypothesis that the multiple correlation in the population 
is zero, we have only to determine from the table of F whether 
the value of this statistic calculated from the sample is significant. 
This decides the significance of the observed multiple correlation, 
We may also remark that, since R 2 is a fli(^p, k(N p 1)) variate 
in samples from a normal population in which x l is uncorrelated 
with any of the p variables .r 2 , x 3 , . . . , x p , 1? the mean value of R 2 is 
given by 



The problem may also be approached from the point of view of 
analysis of variance and covariance. The estimate ^ ( p) of x l9 given 
by the regression equation of x l on the p variables ov ..., x pn , is 
of the form 

i, (48) 



and this is connected with the corresponding deviation x li(p} by 
the equation 



Fisher, 1928, 1. See also Wilks, 1932, 1. 
f See also Fisher, 1924, 6. 
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.The N values of x I t ( p ) are connected by the p -f 1 normal equations 
*i. (j >) = 0, S 2^.^ = (* = 2,...,jp + l). (50) 

Squaring (49) and summing over the distribution we see that 

M = *?.<*) + eU), (51) 

the sum of products vanishing since 



in virtue of (50). The sum in the first member of (51) is equal to 
NSl, where S\ is the variance of x l in the sample; while the first 
sum on the right has the value NSl(l - R 2 ) by (23'). Consequently 



The equations (50) may be regarded as p+ 1 linear constraints on 
the N values x l(p ). Then, on the assumption that these deviations 
are normally distributed, and that R is zero in the population, the 
sum 2 x \.(p)l a * i 8 distributed like x 2 with N -p - 1 D.F., er 2 being the 
variance of x l in the population. But, by 77, ^zf/cr 2 is similarly 
distributed with N 1 D.F.; and therefore, by Theorem II of 81, 
S e i.(p)/* 2 i s distributed like x 2 with p D.F. Thus the two sums on 
the right of (51), when divided by N p 1 andp respectively, give 
independent and unbiased estimates of cr 2 . Inserting their values 
in terms of R* we see that the quotient of these estimates 

'--' 



conforms to the F distribution for 



We thus arrive at the same result as before.* (See also Ex. xn, 13). 

Example. In a sample of 25 sets of values from a normal population 
^1(234) was found to be 0-4. Show that this is not significant of correlation 
in the population between x t and the variables # 2 , cc s , # 4 . 

Here p = 3, i^ = 3, y a = 21. Hence 

0-16 21 4 
F = x = - = 1-33. 
0-84 3 3 

This is not significant, since the 6 % value of F is about 3-1. 

* For the distributions of various statistics occurring in this chapter see 
Bartlett, 1933, 3. 
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EXAMPLES XII 

1 . In a trivariate distribution it is found that 
cr l =^3, (T 2 = 4, a- 3 = 5; r 23 = 0-40, r 31 = 0-60, r 12 = 0-70. 
Prove that the partial correlations are 

r 23>1 = -0-035, r 3l 2 = 0-49, r 12 3 = 0-63, 

and that, if the variates are measured from their means, the linear 
regression equations are 



x l = 

Show also that 

^1.23 = 1-87, (T 2|31 = 2-85, <r 3 . 12 = 4-00, 



2. Show that 

tf'l 

and, more generally, 
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. 3. From 114 (38), deduce that 



which shows how a partial correlation coefficient of order n 2 may 
be expressed in terms of multiple correlation coefficients of orders 
n 1 and n 2. 

4. Prove the identity 



= r !2.3 r 23.i r 31. 



5. Prove the formula 



expressing a regression coefficient in terms of coefficients of higher 
order. Write down the corresponding formula with subscripts 1 
and 2 interchanged, multiply together the two equations and take 
the square root, thus obtaining 



expressing a correlation coefficient in terms of coefficients of higher 
order. 

6. Prove the formulae of Ex. 5 with a group (k) of secondary 
subscripts added to each of the coefficients. 

7. Verify the values of <j> expressed by 116 (43) and (44). 

8. Show that, if x 3 = ax l + bx 2 , the three partial correlations are 
numerically equal to unity, r 13<2 having the sign of a, r 23>1 the sign 
of 6, and r 12<3 the opposite sign to a/b. 

9. For a sample of 30 sets of values from a normal population, 
^1(23) i s found to be 0-5. Show that this is significant of correlation 
in the population between x and o: 2 , # 3 . (F = 4-5, v = 2, v, 2 27.) 

10. Show that a partial correlation r 12 34 = 0-5, in a sample of 
20 sets of values from a normal population, is significant at the 5 % 
level. 

1 1 . Two independent samples, of 46 and 36 sets of values, give 
corresponding partial correlations of order three as 0*41 and 066 
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respectively. Show that this is not inconsistent with the assumption 
that the samples are from the same normal population. Show also 
that an estimate of the partial correlation in the population, by the 
method of 96, is 0-53. 

12. Show that, corresponding to the first sample in 117, Ex. 2, 
the 95 % fiducial limits for the partial correlation in the population 
are 0-03 and 0-67. 

13. Deduce from the argument on p. 259 that, when the mul- 
tiple correlation in the normal population is zero, JR 2 /(1 J? 2 ) 
is a Pz(\p, \(N p 1)) variate and therefore, by 72, R 2 is a 

-l variate. 



14. With the notation of 118 show that, when the multiple 
correlation in the normal population is zero, the expected value of 
R in the sample is F(\(p + 1)) 
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